[jira] [Updated] (YARN-8817) [Submarine] In cases when user doesn't ask HDFS path while submitting job but framework requires user to set HDFS related environments

2018-09-24 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8817:
-
Summary: [Submarine] In cases when user doesn't ask HDFS path while 
submitting job but framework requires user to set HDFS related environments  
(was: [Submarine] In some cases HDFS is not asked by user when submit job but 
framework requires user to set HDFS related environments)

> [Submarine] In cases when user doesn't ask HDFS path while submitting job but 
> framework requires user to set HDFS related environments
> --
>
> Key: YARN-8817
> URL: https://issues.apache.org/jira/browse/YARN-8817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: submarine
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8817.001.patch
>
>
> User who submit the job can see the error message like: 
> 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is 
> being used to read/write models/data. Followingenvs are required: 1) 
> DOCKER_HADOOP_HDFS_HOME= 2) 
> DOCKER_JAVA_HOME=. You can use --env to 
> pass these envars.
> Exception in thread "main" java.io.IOException: Failed to detect HDFS-related 
> environments
> Even if hdfs is not asked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626804#comment-16626804
 ] 

Hadoop QA commented on YARN-5939:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: 
The patch generated 0 new + 90 unchanged - 1 fixed = 90 total (was 91) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
9s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-5939 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941163/YARN-5939.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 349a37e98c4e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 29dad7d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21961/testReport/ |
| Max. process+thread count | 409 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21961/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> FSDownload leaks FileSystem 

[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626802#comment-16626802
 ] 

Rohith Sharma K S commented on YARN-8815:
-

+1 lgtm..

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8815.001.patch, YARN-8815.002.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> 

[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-09-24 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626792#comment-16626792
 ] 

Rohith Sharma K S commented on YARN-8627:
-

bq. One thing I noticed is that only the "domainlog" file was present in these 
type of repeated appid directories. Other types such as summarylog/entitylog 
were present only in the normal expected directory structure.
Interesting!! It means retrieving these entities _*had problem i.e with ACLs 
but unidentified*_. Can we see content of both the domain log files? I still 
would like to find root cause of this issue! I suspect something going wrong 
while moving from active to done directory.
Some doubts
# Does this happening in HDFS or any other Filesystem? 
# Does the cluster is enabled with 
*yarn.timeline-service.entity-group-fs-store.with-user-dir* flag? 


> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM

2018-09-24 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626785#comment-16626785
 ] 

Akira Ajisaka commented on YARN-8816:
-

Okay. I'll reopen HADOOP-15764 and close this issue.

> YARN Unit Tests Fail with Ubuntu VM
> ---
>
> Key: YARN-8816
> URL: https://issues.apache.org/jira/browse/YARN-8816
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-8816.01.patch
>
>
> {code}
> Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> {code}
> {code}
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands
> [ERROR] 
> testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands)
>   Time elapsed: 2.668 s  <<< ERROR!
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
>   at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828)
>   at 
> org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593)
>   at 
> 

[jira] [Updated] (YARN-5939) FSDownload leaks FileSystem resources

2018-09-24 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-5939:
--
Attachment: YARN-5939.005.patch

> FSDownload leaks FileSystem resources
> -
>
> Key: YARN-5939
> URL: https://issues.apache.org/jira/browse/YARN-5939
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1, 2.7.3
>Reporter: liuxiangwei
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: leak
> Attachments: YARN-5939.004.patch, YARN-5939.005.patch, 
> YARN-5939.01.patch, YARN-5939.02.patch, YARN-5939.03.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Background
> To use our self-defined FileSystem class, the item of configuration 
> "fs.%s.impl.disable.cache" should set to true.
> In YARN's source code, the class named 
> "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, 
> which leading to file descriptor leak because our self-defined FileSystem 
> class close the file descriptor when the close function is invoked.
> My Question below:
> 1. whether invoking "getFileSystem" but never close is YARN's expected 
> behavior 
> 2. what should we do in our self-defined FileSystem resolve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources

2018-09-24 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626770#comment-16626770
 ] 

Weiwei Yang commented on YARN-5939:
---

Hi [~bsteinbach]

Appreciate your help to rebase it.

The wrapper class was to simply extend \{{LocalFileSystem}} and do a count on 
every initialize and close call. As we are expecting file system should be 
always closed to avoid leaking resource. This is only used in the test class.

 

> FSDownload leaks FileSystem resources
> -
>
> Key: YARN-5939
> URL: https://issues.apache.org/jira/browse/YARN-5939
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1, 2.7.3
>Reporter: liuxiangwei
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: leak
> Attachments: YARN-5939.004.patch, YARN-5939.01.patch, 
> YARN-5939.02.patch, YARN-5939.03.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Background
> To use our self-defined FileSystem class, the item of configuration 
> "fs.%s.impl.disable.cache" should set to true.
> In YARN's source code, the class named 
> "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, 
> which leading to file descriptor leak because our self-defined FileSystem 
> class close the file descriptor when the close function is invoked.
> My Question below:
> 1. whether invoking "getFileSystem" but never close is YARN's expected 
> behavior 
> 2. what should we do in our self-defined FileSystem resolve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM

2018-09-24 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626765#comment-16626765
 ] 

Bibin A Chundatt commented on YARN-8816:


[~ajisakaa]

I think its better to add as addendum patch for HADOOP-15764 ??

> YARN Unit Tests Fail with Ubuntu VM
> ---
>
> Key: YARN-8816
> URL: https://issues.apache.org/jira/browse/YARN-8816
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-8816.01.patch
>
>
> {code}
> Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> {code}
> {code}
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands
> [ERROR] 
> testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands)
>   Time elapsed: 2.668 s  <<< ERROR!
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
>   at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828)
>   at 
> org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> Caused by: java.lang.NullPointerException
>   at 
> 

[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-09-24 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626764#comment-16626764
 ] 

Weiwei Yang commented on YARN-8657:
---

Hi [~sunilg], It seems the UT failure was related, I tried that locally, seems 
reproducible. Can you pls check?

> User limit calculation should be read-lock-protected within LeafQueue
> -
>
> Key: YARN-8657
> URL: https://issues.apache.org/jira/browse/YARN-8657
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8657.001.patch, YARN-8657.002.patch
>
>
> When async scheduling is enabled, user limit calculation could be wrong: 
> It is possible that scheduler calculated a user_limit, but inside 
> {{canAssignToUser}} it becomes staled. 
> We need to protect user limit calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM

2018-09-24 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-8816:

Attachment: YARN-8816.01.patch

> YARN Unit Tests Fail with Ubuntu VM
> ---
>
> Key: YARN-8816
> URL: https://issues.apache.org/jira/browse/YARN-8816
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: YARN-8816.01.patch
>
>
> {code}
> Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> {code}
> {code}
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands
> [ERROR] 
> testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands)
>   Time elapsed: 2.668 s  <<< ERROR!
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
>   at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828)
>   at 
> org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593)
>   at 
> 

[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8815:
---
Attachment: YARN-8815.002.patch

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8815.001.patch, YARN-8815.002.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> 

[jira] [Assigned] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM

2018-09-24 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reassigned YARN-8816:
---

Assignee: Akira Ajisaka

> YARN Unit Tests Fail with Ubuntu VM
> ---
>
> Key: YARN-8816
> URL: https://issues.apache.org/jira/browse/YARN-8816
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: Akira Ajisaka
>Priority: Major
>
> {code}
> Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> {code}
> {code}
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands
> [ERROR] 
> testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands)
>   Time elapsed: 2.668 s  <<< ERROR!
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
>   at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828)
>   at 
> org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593)
>   at 
> org.apache.hadoop.security.SecurityUtil.setTokenServiceUseIp(SecurityUtil.java:129)
>   at 
> 

[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8815:
---
Priority: Critical  (was: Major)

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> 

[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-09-24 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626714#comment-16626714
 ] 

Weiwei Yang commented on YARN-8468:
---

Hi [~haibochen]/[~bsteinbach]
{quote}{{ApplicationMasterProtocol}} is the connection path between AMs and RM 
for non-AM container requests. For AM-container requests, the normalization is 
done by RMAppManager.
{quote}
That's correct. What I was not clear about was why we have changes in (*#1*) 
RMAppManager({{ApplicationClientProtocol}}), DefaultAMSProcessor and 
PlacementConstraintProcessor ({{ApplicationMasterProtocol}}), but there are 
changes in (*#2*) {{FS/CS#allocate}}. I thought we could do just #1 or #2, not 
both ... Given the situation now, all #1 seems are done, #2 is not completed 
(see my last comment for CS side, it doesn't do normalization against queue 
level max-resource nor the updated requests), can we enforce the validation & 
normalization for #1 (this also helps to reduce the overhead in scheduler). So 
I am thinking if we could do
 * Revert the API changes in YarnScheduler, AbstractYarnScheduler
 * Refine the changes in PlacementConstraintProcessor, RMAppManager

changes like
{code:java}
this.scheduler.getNormalizedResource(reqResource, maxAllocation);
{code}
can be replaced with utility methods:
{code:java}
SchedulerUtils.getNormalizedResource(...)
// or
RMServerUtils.normalizeAndValidateRequests(...){code}
would that simplify the patch? Just a thought, things around normalization seem 
to be over-complex now, trying to sort out if we can do minimal changes to get 
this work. Please let me know if that makes sense, thanks.

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Rakesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Shah updated YARN-8815:
--
Priority: Major  (was: Critical)

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> 

[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8815:
---
Target Version/s: 3.2.0, 3.1.2  (was: 3.2.0)

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> 

[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments

2018-09-24 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626694#comment-16626694
 ] 

Sunil Govindan commented on YARN-8817:
--

Thanks [~leftnoteasy]. Fix looks straight forward.

Committing shortly.

> [Submarine] In some cases HDFS is not asked by user when submit job but 
> framework requires user to set HDFS related environments
> 
>
> Key: YARN-8817
> URL: https://issues.apache.org/jira/browse/YARN-8817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: submarine
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8817.001.patch
>
>
> User who submit the job can see the error message like: 
> 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is 
> being used to read/write models/data. Followingenvs are required: 1) 
> DOCKER_HADOOP_HDFS_HOME= 2) 
> DOCKER_JAVA_HOME=. You can use --env to 
> pass these envars.
> Exception in thread "main" java.io.IOException: Failed to detect HDFS-related 
> environments
> Even if hdfs is not asked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-24 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.7.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, 
> YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, 
> YARN-8789.7.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.
> Logging Message:
> Size of event-queue is xxx



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626680#comment-16626680
 ] 

Hadoop QA commented on YARN-8789:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
52s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 41s{color} | {color:orange} root: The patch generated 8 new + 889 unchanged 
- 11 fixed = 897 total (was 900) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
48s{color} | {color:red} hadoop-common-project/hadoop-common generated 1 new + 
0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  3s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
41s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}192m 59s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
19s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 47s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}323m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-common-project/hadoop-common |
|  |  Dead store to useIp in 
org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(Configuration) 
 At 

[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626675#comment-16626675
 ] 

Hadoop QA commented on YARN-8789:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 39s{color} | {color:orange} root: The patch generated 8 new + 889 unchanged 
- 11 fixed = 897 total (was 900) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
52s{color} | {color:red} hadoop-common-project/hadoop-common generated 1 new + 
0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 14s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
40s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}190m 31s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m  
5s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m  8s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
54s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}333m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-common-project/hadoop-common |
|  |  Dead store to useIp in 
org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(Configuration) 
 At 

[jira] [Assigned] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-8815:
--

Assignee: Bibin A Chundatt

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> 

[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626672#comment-16626672
 ] 

Hadoop QA commented on YARN-8800:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange}  0m  
8s{color} | {color:orange} The patch generated 74 new + 0 unchanged - 0 fixed = 
74 total (was 0) {color} |
| {color:red}-1{color} | {color:red} shellcheck {color} | {color:red}  0m  
0s{color} | {color:red} The patch generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0) {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
33s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 25 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch 5 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
4s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
48s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8800 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941133/YARN-8800.002.patch |
| Optional Tests |  dupname  asflicense  mvnsite  xml  compile  javac  javadoc  
mvninstall  unit  shadedclient  shellcheck  shelldocs  pylint  |
| uname | Linux 266a81df1c77 

[jira] [Commented] (YARN-8758) PreemptionMessage when using AMRMClientAsync

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626660#comment-16626660
 ] 

Hadoop QA commented on YARN-8758:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 1 new + 
13 unchanged - 0 fixed = 14 total (was 13) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 
50s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8758 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941126/YARN-8758.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bb9963491f93 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 29dad7d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21956/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21956/testReport/ |
| Max. process+thread count | 684 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments

2018-09-24 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626619#comment-16626619
 ] 

Wangda Tan commented on YARN-8817:
--

Verified this in a cluster.

> [Submarine] In some cases HDFS is not asked by user when submit job but 
> framework requires user to set HDFS related environments
> 
>
> Key: YARN-8817
> URL: https://issues.apache.org/jira/browse/YARN-8817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: submarine
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8817.001.patch
>
>
> User who submit the job can see the error message like: 
> 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is 
> being used to read/write models/data. Followingenvs are required: 1) 
> DOCKER_HADOOP_HDFS_HOME= 2) 
> DOCKER_JAVA_HOME=. You can use --env to 
> pass these envars.
> Exception in thread "main" java.io.IOException: Failed to detect HDFS-related 
> environments
> Even if hdfs is not asked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-24 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626618#comment-16626618
 ] 

Wangda Tan commented on YARN-8800:
--

Attached ver.2 patch includes notebooks examples as well.

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch, YARN-8800.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626617#comment-16626617
 ] 

Hadoop QA commented on YARN-8817:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
32s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine 
in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8817 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941121/YARN-8817.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f796a129781b 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 29dad7d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/21955/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21955/testReport/ |
| Max. process+thread count | 395 (vs. ulimit of 1) |
| modules | C: 

[jira] [Updated] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8800:
-
Attachment: YARN-8800.002.patch

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch, YARN-8800.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8758) PreemptionMessage when using AMRMClientAsync

2018-09-24 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626601#comment-16626601
 ] 

Zian Chen commented on YARN-8758:
-

Hi [~sunilg] [~weiweiyagn666], could you help review the patch? Thanks

 

> PreemptionMessage when using AMRMClientAsync
> 
>
> Key: YARN-8758
> URL: https://issues.apache.org/jira/browse/YARN-8758
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.1
>Reporter: Krishna Kishore
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8758.001.patch
>
>
> Hi,
>    The preemption notification messages sent in the time period defined by 
> the following parameter now work only on AMRMClient, but not on 
> AMRMClientAsync.
> *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill*
> We want this work on the AMRMClientAsync also because our implementations are 
> based on this one. 
>  
> Thanks,
> Kishore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8758) PreemptionMessage when using AMRMClientAsync

2018-09-24 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8758:

Attachment: YARN-8758.001.patch

> PreemptionMessage when using AMRMClientAsync
> 
>
> Key: YARN-8758
> URL: https://issues.apache.org/jira/browse/YARN-8758
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.1
>Reporter: Krishna Kishore
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8758.001.patch
>
>
> Hi,
>    The preemption notification messages sent in the time period defined by 
> the following parameter now work only on AMRMClient, but not on 
> AMRMClientAsync.
> *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill*
> We want this work on the AMRMClientAsync also because our implementations are 
> based on this one. 
>  
> Thanks,
> Kishore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user

2018-09-24 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626598#comment-16626598
 ] 

Billie Rinaldi commented on YARN-8734:
--

Makes sense to me. Thanks, [~eyang]!

> Readiness check for remote service belongs to the same user
> ---
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf, YARN-8734.001.patch, 
> YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, 
> YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8616) systemClock should be used in RMAppImpl instead of System.currentTimeMills() to be consistent

2018-09-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626572#comment-16626572
 ] 

Hudson commented on YARN-8616:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15047 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15047/])
YARN-8616. systemClock should be used in RMAppImpl instead of (haibochen: rev 
29dad7d258c621a0ff3a64c595a2e32c66c59d11)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> systemClock should be used in RMAppImpl instead of System.currentTimeMills() 
> to be consistent
> -
>
> Key: YARN-8616
> URL: https://issues.apache.org/jira/browse/YARN-8616
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8616.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments

2018-09-24 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626568#comment-16626568
 ] 

Wangda Tan commented on YARN-8817:
--

The root cause is: 

In any case we will generate checkpoint_path, saved_model_path for jobs, by 
default we will use HDFS as default FS. However it is possible we will not use 
them. So instead of relying on checkpoint_path / saved_model_path. We will 
check launch command about hdfs usage.

> [Submarine] In some cases HDFS is not asked by user when submit job but 
> framework requires user to set HDFS related environments
> 
>
> Key: YARN-8817
> URL: https://issues.apache.org/jira/browse/YARN-8817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: submarine
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8817.001.patch
>
>
> User who submit the job can see the error message like: 
> 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is 
> being used to read/write models/data. Followingenvs are required: 1) 
> DOCKER_HADOOP_HDFS_HOME= 2) 
> DOCKER_JAVA_HOME=. You can use --env to 
> pass these envars.
> Exception in thread "main" java.io.IOException: Failed to detect HDFS-related 
> environments
> Even if hdfs is not asked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments

2018-09-24 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626569#comment-16626569
 ] 

Wangda Tan commented on YARN-8817:
--

cc: [~sunilg]

> [Submarine] In some cases HDFS is not asked by user when submit job but 
> framework requires user to set HDFS related environments
> 
>
> Key: YARN-8817
> URL: https://issues.apache.org/jira/browse/YARN-8817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: submarine
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8817.001.patch
>
>
> User who submit the job can see the error message like: 
> 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is 
> being used to read/write models/data. Followingenvs are required: 1) 
> DOCKER_HADOOP_HDFS_HOME= 2) 
> DOCKER_JAVA_HOME=. You can use --env to 
> pass these envars.
> Exception in thread "main" java.io.IOException: Failed to detect HDFS-related 
> environments
> Even if hdfs is not asked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8817:
-
Attachment: YARN-8817.001.patch

> [Submarine] In some cases HDFS is not asked by user when submit job but 
> framework requires user to set HDFS related environments
> 
>
> Key: YARN-8817
> URL: https://issues.apache.org/jira/browse/YARN-8817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: submarine
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8817.001.patch
>
>
> User who submit the job can see the error message like: 
> 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is 
> being used to read/write models/data. Followingenvs are required: 1) 
> DOCKER_HADOOP_HDFS_HOME= 2) 
> DOCKER_JAVA_HOME=. You can use --env to 
> pass these envars.
> Exception in thread "main" java.io.IOException: Failed to detect HDFS-related 
> environments
> Even if hdfs is not asked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-8817:


Assignee: Wangda Tan

> [Submarine] In some cases HDFS is not asked by user when submit job but 
> framework requires user to set HDFS related environments
> 
>
> Key: YARN-8817
> URL: https://issues.apache.org/jira/browse/YARN-8817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: submarine
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8817.001.patch
>
>
> User who submit the job can see the error message like: 
> 18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is 
> being used to read/write models/data. Followingenvs are required: 1) 
> DOCKER_HADOOP_HDFS_HOME= 2) 
> DOCKER_JAVA_HOME=. You can use --env to 
> pass these envars.
> Exception in thread "main" java.io.IOException: Failed to detect HDFS-related 
> environments
> Even if hdfs is not asked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user

2018-09-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626565#comment-16626565
 ] 

Eric Yang commented on YARN-8734:
-

[~billie.rinaldi] Good catch on the relationship between liveness monitor and 
AM startup.  It makes sense to do the dependency check after AM registered with 
RM to ensure the dependency wait doesn't get expired by RM in case the high 
level coordinator submit applications and expect the jobs to queue and sort out 
the dependencies automatically.  I'll revise the patch accordingly if you agree.

> Readiness check for remote service belongs to the same user
> ---
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf, YARN-8734.001.patch, 
> YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, 
> YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8804) resourceLimits may be wrongly calculated when leaf-queue is blocked in cluster with 3+ level queues

2018-09-24 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626564#comment-16626564
 ] 

Jason Lowe commented on YARN-8804:
--

Thanks for updating the patch!  +1 lgtm.  I'll commit this by Wednesday if 
there are no objections.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---
>
> Key: YARN-8804
> URL: https://issues.apache.org/jira/browse/YARN-8804
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8804.001.patch, YARN-8804.002.patch, 
> YARN-8804.003.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>   Root
>  /  |  \
> a   bc
>10   20   70
>  |   \
> c1   c2
>   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8817) [Submarine] In some cases HDFS is not asked by user when submit job but framework requires user to set HDFS related environments

2018-09-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8817:


 Summary: [Submarine] In some cases HDFS is not asked by user when 
submit job but framework requires user to set HDFS related environments
 Key: YARN-8817
 URL: https://issues.apache.org/jira/browse/YARN-8817
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: submarine
Reporter: Wangda Tan


User who submit the job can see the error message like: 

18/09/24 23:12:58 ERROR yarnservice.YarnServiceJobSubmitter: When hdfs is being 
used to read/write models/data. Followingenvs are required: 1) 
DOCKER_HADOOP_HDFS_HOME= 2) 
DOCKER_JAVA_HOME=. You can use --env to pass 
these envars.
Exception in thread "main" java.io.IOException: Failed to detect HDFS-related 
environments

Even if hdfs is not asked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6338) Typos in Docker docs: contains => containers

2018-09-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626550#comment-16626550
 ] 

Hudson commented on YARN-6338:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15046 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15046/])
YARN-6338. Typos in Docker docs: contains => containers. (Contributed by 
(haibochen: rev cf62ff9a6a48b97cd93b405e13b58dbbaea1925f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md


> Typos in Docker docs: contains => containers
> 
>
> Key: YARN-6338
> URL: https://issues.apache.org/jira/browse/YARN-6338
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Zoltan Siegl
>Priority: Minor
>  Labels: docs
> Fix For: 3.2.0
>
> Attachments: YARN-6338.001.patch
>
>
> "allowed to request privileged contains" should be "allowed to request 
> privileged containers"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8616) systemClock should be used in RMAppImpl instead of System.currentTimeMills() to be consistent

2018-09-24 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8616:
-
Summary: systemClock should be used in RMAppImpl instead of 
System.currentTimeMills() to be consistent  (was: System.currentTimeMillis() 
used in RMAppImpl, instead of getting value from systemClock)

> systemClock should be used in RMAppImpl instead of System.currentTimeMills() 
> to be consistent
> -
>
> Key: YARN-8616
> URL: https://issues.apache.org/jira/browse/YARN-8616
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8616.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8616) System.currentTimeMillis() used in RMAppImpl, instead of getting value from systemClock

2018-09-24 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626544#comment-16626544
 ] 

Haibo Chen commented on YARN-8616:
--

+1 on the patch. Thanks [~snemeth] for the fix, commiting shortly.

> System.currentTimeMillis() used in RMAppImpl, instead of getting value from 
> systemClock
> ---
>
> Key: YARN-8616
> URL: https://issues.apache.org/jira/browse/YARN-8616
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8616.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6338) Typos in Docker docs: contains => containers

2018-09-24 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626536#comment-16626536
 ] 

Haibo Chen commented on YARN-6338:
--

+1 on the patch. Committing it shortly.

> Typos in Docker docs: contains => containers
> 
>
> Key: YARN-6338
> URL: https://issues.apache.org/jira/browse/YARN-6338
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Zoltan Siegl
>Priority: Minor
>  Labels: docs
> Attachments: YARN-6338.001.patch
>
>
> "allowed to request privileged contains" should be "allowed to request 
> privileged containers"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures

2018-09-24 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8759:
-
Labels: unit-test  (was: )

> Copy of "resource-types.xml" is not deleted if test fails, causes other test 
> failures
> -
>
> Key: YARN-8759
> URL: https://issues.apache.org/jira/browse/YARN-8759
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: unit-test
> Attachments: YARN-8759.001.patch, YARN-8759.002.patch, 
> YARN-8759.003.patch, YARN-8759.004.patch
>
>
> resource-types.xml is copied in several tests to the test machine, but it is 
> deleted only at the end of the test. In case the test fails the file will not 
> be deleted and other tests will fail, because of the wrong configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8734) Readiness check for remote service belongs to the same user

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8734:
-
Summary: Readiness check for remote service belongs to the same user  (was: 
Readiness check for remote service)

> Readiness check for remote service belongs to the same user
> ---
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf, YARN-8734.001.patch, 
> YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, 
> YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8758) PreemptionMessage when using AMRMClientAsync

2018-09-24 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626513#comment-16626513
 ] 

Zian Chen commented on YARN-8758:
-

I'll work on this Jira and provide an initial patch.

> PreemptionMessage when using AMRMClientAsync
> 
>
> Key: YARN-8758
> URL: https://issues.apache.org/jira/browse/YARN-8758
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.1
>Reporter: Krishna Kishore
>Priority: Major
>
> Hi,
>    The preemption notification messages sent in the time period defined by 
> the following parameter now work only on AMRMClient, but not on 
> AMRMClientAsync.
> *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill*
> We want this work on the AMRMClientAsync also because our implementations are 
> based on this one. 
>  
> Thanks,
> Kishore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8758) PreemptionMessage when using AMRMClientAsync

2018-09-24 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen reassigned YARN-8758:
---

Assignee: Zian Chen

> PreemptionMessage when using AMRMClientAsync
> 
>
> Key: YARN-8758
> URL: https://issues.apache.org/jira/browse/YARN-8758
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.1
>Reporter: Krishna Kishore
>Assignee: Zian Chen
>Priority: Major
>
> Hi,
>    The preemption notification messages sent in the time period defined by 
> the following parameter now work only on AMRMClient, but not on 
> AMRMClientAsync.
> *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill*
> We want this work on the AMRMClientAsync also because our implementations are 
> based on this one. 
>  
> Thanks,
> Kishore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8758) PreemptionMessage when using AMRMClientAsync

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8758:
-
Target Version/s: 3.1.2  (was: 3.1.1)

> PreemptionMessage when using AMRMClientAsync
> 
>
> Key: YARN-8758
> URL: https://issues.apache.org/jira/browse/YARN-8758
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.1
>Reporter: Krishna Kishore
>Priority: Major
>
> Hi,
>    The preemption notification messages sent in the time period defined by 
> the following parameter now work only on AMRMClient, but not on 
> AMRMClientAsync.
> *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill*
> We want this work on the AMRMClientAsync also because our implementations are 
> based on this one. 
>  
> Thanks,
> Kishore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8758) PreemptionMessage when using AMRMClientAsync

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8758:
-
Fix Version/s: (was: 2.7.6)

> PreemptionMessage when using AMRMClientAsync
> 
>
> Key: YARN-8758
> URL: https://issues.apache.org/jira/browse/YARN-8758
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.1.1
>Reporter: Krishna Kishore
>Priority: Major
>
> Hi,
>    The preemption notification messages sent in the time period defined by 
> the following parameter now work only on AMRMClient, but not on 
> AMRMClientAsync.
> *yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill*
> We want this work on the AMRMClientAsync also because our implementations are 
> based on this one. 
>  
> Thanks,
> Kishore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7644) NM gets backed up deleting docker containers

2018-09-24 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626455#comment-16626455
 ] 

Chandni Singh edited comment on YARN-7644 at 9/24/18 9:14 PM:
--

For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and 
{{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks 
and submits it to the executor to be performed in a non-blocking way:
{code:java}
containerLauncher.submit(launch);
{code}
However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, 
{{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions 
are performed in a blocking way.
{code:java}
 launcher.cleanupContainer();
{code}
With this Jira, I can focus on {{CLEANUP_CONTAINER}} and 
{{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way.  

Doesn't look the caller ({{ContainerImpl}}) waits anywhere for 
{{cleanupContainer()}} to be performed synchronously. It is triggered by 
dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events.

 

cc. [~ebadger] [~jlowe]


was (Author: csingh):
For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and 
{{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks 
and submits it to the executor to be performed in a non-blocking way:
{code:java}
containerLauncher.submit(launch);
{code}
However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, 
{{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions 
are performed in a blocking way.
{code:java}
 launcher.cleanupContainer();
{code}
With this Jira, I can focus on {{CLEANUP_CONTAINER}} and 
{{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way.  

Doesn't look the caller ({{ContainerImpl}}) waits anywhere for 
{{cleanupContainer()}} to be performed synchronously. It is triggered by 
dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events.

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers

2018-09-24 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626455#comment-16626455
 ] 

Chandni Singh commented on YARN-7644:
-

For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and 
{{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks 
and submits it to the executor to be performed in a non-blocking way:
{code:java}
containerLauncher.submit(launch);
{code}
However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, 
{{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions 
are performed in a blocking way.
{code:java}
 launcher.cleanupContainer();
{code}
With this Jira, I can focus on {{CLEANUP_CONTAINER}} and 
{{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way.  

Doesn't look the caller ({{ContainerImpl}}) waits anywhere for 
{{cleanupContainer()}} to be performed synchronously. It is triggered by 
dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events.

> NM gets backed up deleting docker containers
> 
>
> Key: YARN-7644
> URL: https://issues.apache.org/jira/browse/YARN-7644
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Eric Badger
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
>
> We are sending a {{docker stop}} to the docker container with a timeout of 10 
> seconds when we shut down a container. If the container does not stop after 
> 10 seconds then we force kill it. However, the {{docker stop}} command is a 
> blocking call. So in cases where lots of containers don't go down with the 
> initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to 
> return. This ties up the ContainerLaunch handler and so these kill events 
> back up. It also appears to be backing up new container launches as well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626448#comment-16626448
 ] 

Hadoop QA commented on YARN-8800:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} pylint {color} | {color:orange}  0m  
7s{color} | {color:orange} The patch generated 74 new + 0 unchanged - 0 fixed = 
74 total (was 0) {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
17s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 21 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
22s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8800 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941103/YARN-8800.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  shellcheck  shelldocs  pylint  |
| uname | Linux bd0dfe41f7ea 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c07715e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| shellcheck | v0.4.6 |
| pylint | v1.9.2 |
| pylint | 
https://builds.apache.org/job/PreCommit-YARN-Build/21952/artifact/out/diff-patch-pylint.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/21952/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21952/testReport/ |
| asflicense | 

[jira] [Commented] (YARN-8627) EntityGroupFSTimelineStore hdfs done directory keeps on accumulating

2018-09-24 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626418#comment-16626418
 ] 

Wangda Tan commented on YARN-8627:
--

[~rohithsharma], could u help to review the latest comment?

> EntityGroupFSTimelineStore hdfs done directory keeps on accumulating
> 
>
> Key: YARN-8627
> URL: https://issues.apache.org/jira/browse/YARN-8627
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-8627.001.patch, YARN-8627.002.patch
>
>
> The EntityLogCleaner threads exits with the following ERROR every time it 
> runs.  
> {code:java}
> 2018-07-18 19:59:39,837 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18268
> 2018-07-18 19:59:39,844 INFO timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:cleanLogs(462)) - Deleting 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
> 2018-07-18 19:59:39,848 ERROR timeline.EntityGroupFSTimelineStore 
> (EntityGroupFSTimelineStore.java:run(899)) - Error cleaning files  
> java.io.FileNotFoundException: File 
> hdfs://namenode/ats/done/1499684568068//018/application_1499684568068_18270
>  does not exist.  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1062)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1069)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1040)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1019)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1015)
>   at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.shouldCleanAppLogDir(EntityGroupFSTimelineStore.java:480)
>  
> {code}
>  
>  Each time the thread gets scheduled, it is a different folder encountering 
> the error. As a result, the thread is not able to clean all the old done 
> directories, since it stops after this error. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-24 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.7.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch, YARN-8789.3.patch, 
> YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.
> Logging Message:
> Size of event-queue is xxx



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8665) Yarn Service Upgrade: Support cancelling upgrade

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626399#comment-16626399
 ] 

Hadoop QA commented on YARN-8665:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 3 new + 443 unchanged - 4 fixed = 446 total (was 447) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
17s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 56s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 
17s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
42s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
45s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}148m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestNMProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce 

[jira] [Commented] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-24 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626345#comment-16626345
 ] 

Wangda Tan commented on YARN-8800:
--

Attached ver.1 patch, includes some screenshots that's why size of the patch is 
large.

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8800) Updated documentation of Submarine with latest examples.

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8800:
-
Attachment: YARN-8800.001.patch

> Updated documentation of Submarine with latest examples.
> 
>
> Key: YARN-8800
> URL: https://issues.apache.org/jira/browse/YARN-8800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8800.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626315#comment-16626315
 ] 

Hadoop QA commented on YARN-8815:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 178 unchanged - 0 fixed = 179 total (was 178) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 77m  
3s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}136m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8815 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941075/YARN-8815.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7057662844d3 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 62f817d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21949/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21949/testReport/ |
| Max. process+thread count | 892 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2018-09-24 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626288#comment-16626288
 ] 

Haibo Chen commented on YARN-1011:
--

The tests look good locally. I have pushed my local branch upstream. Let me 
know if you see issues [~asuresh].

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>Assignee: Karthik Kambatla
>Priority: Major
> Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, 
> yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf, yarn-1011-design-v3.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626273#comment-16626273
 ] 

Hudson commented on YARN-8696:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15042 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15042/])
YARN-8696. [AMRMProxy] FederationInterceptor upgrade: home sub-cluster (gifuma: 
rev 3090922805699b8374a359e92323884a4177dc4e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/AMHeartbeatRequestHandler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedApplicationManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/AMRMClientUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestableFederationInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/MockResourceManagerFacade.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/utils/FederationRegistryClient.java


> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, 
> YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, 
> YARN-8696.v5.patch, YARN-8696.v6.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8810) Yarn Service: discrepancy between hashcode and equals of ConfigFile

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626266#comment-16626266
 ] 

Hadoop QA commented on YARN-8810:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
10s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8810 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941081/YARN-8810.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e2c37fe17ebc 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8de5c92 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21951/testReport/ |
| Max. process+thread count | 757 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21951/console |
| 

[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-24 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8696:
---
Component/s: nodemanager

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, 
> YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, 
> YARN-8696.v5.patch, YARN-8696.v6.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-24 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8696:
---
Fix Version/s: 3.2.0

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, 
> YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, 
> YARN-8696.v5.patch, YARN-8696.v6.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-24 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626251#comment-16626251
 ] 

Giovanni Matteo Fumarola commented on YARN-8696:


Thanks [~botong] for the patch.

Committed [^YARN-8696.v6.patch] to trunk and [^YARN-8696-branch-2.v6.patch] to 
branch-2.

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8696-branch-2.v6.patch, YARN-8696.v1.patch, 
> YARN-8696.v2.patch, YARN-8696.v3.patch, YARN-8696.v4.patch, 
> YARN-8696.v5.patch, YARN-8696.v6.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626139#comment-16626139
 ] 

Bibin A Chundatt edited comment on YARN-8815 at 9/24/18 6:31 PM:
-

{quote}
 finished unmanaged app?
{quote}
Will happen for Finished unmanaged app , since in final state the AppData will 
be pruned.

In addition to issue mentioned in jira ,we do have impact on applicationReport 
and RMAppBlock display fields. [~sunilg]/[~rohithsharma] could you cross check ?


was (Author: bibinchundatt):
{quote}
 finished unmanaged app?
{quote}
Will happen for Finished unmanaged app , since in final state the AppData will 
be pruned.
Addition we should be having impact on applicationReport and RMAppBlock display 
fields.

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at 

[jira] [Updated] (YARN-8810) Yarn Service: discrepancy between hashcode and equals of ConfigFile

2018-09-24 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8810:

Attachment: YARN-8810.001.patch

> Yarn Service: discrepancy between hashcode and equals of ConfigFile
> ---
>
> Key: YARN-8810
> URL: https://issues.apache.org/jira/browse/YARN-8810
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Minor
> Attachments: YARN-8810.001.patch
>
>
> The {{ConfigFile}} class {{equals}} method doesn't check the equality of 
> {{properties}}. The {{hashCode}} does include the {{properties}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8665) Yarn Service Upgrade: Support cancelling upgrade

2018-09-24 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8665:

Attachment: YARN-8665.005.patch

> Yarn Service Upgrade:  Support cancelling upgrade
> -
>
> Key: YARN-8665
> URL: https://issues.apache.org/jira/browse/YARN-8665
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8665.001.patch, YARN-8665.002.patch, 
> YARN-8665.003.patch, YARN-8665.004.patch, YARN-8665.005.patch
>
>
> When a service is upgraded without auto-finalization or express upgrade, then 
> the upgrade can be cancelled. This provides the user ability to test upgrade 
> of a single instance and if that doesn't go well, they get a chance to cancel 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626139#comment-16626139
 ] 

Bibin A Chundatt commented on YARN-8815:


{quote}
 finished unmanaged app?
{quote}
Will happen for Finished unmanaged app , since in final state the AppData will 
be pruned.
Addition we should be having impact on applicationReport and RMAppBlock display 
fields.

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> 

[jira] [Commented] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-24 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626131#comment-16626131
 ] 

Haibo Chen commented on YARN-8808:
--

Hi [~asuresh]. Not sure if I follow you correctly. Sounds like you are 
referring to aggregateUtilization as the aggregate resource *ALLOCATION* of all 
containers, right?

In the case of Fair Scheduler + oversubscription, aggregateUtilization is the 
aggregate resource *UTILIZATION* of all containers on a node (rather than 
aggregate allocation).

The issue here came up in our testing configuration where only a fraction of 
the node's hardware resources is allowed to run containers (Say the node has 
10GB memory and 10 vcores, but through configuration we allow the RM to only 
see 8GB and 8vcores).  Therefore, I think the scheduler side should just see 
numbers based on the configured node capacity (8GB, 8vcores). nodeUtlization by 
default, is detected from OS and is therefore based on the node's actually 
capacity (10GB, 10vcores).

aggregateUtilization/nodeUtilization would tell us how much percentage of the 
node utilization is attributed to running containers, no?

> Use aggregate container utilization instead of node utilization to determine 
> resources available for oversubscription
> -
>
> Key: YARN-8808
> URL: https://issues.apache.org/jira/browse/YARN-8808
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8088-YARN-1011.01.patch, 
> YARN-8808-YARN-1011.00.patch
>
>
> Resource oversubscription should be bound to the amount of the resources that 
> can be allocated to containers, hence the allocation threshold should be with 
> respect to aggregate container utilization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626124#comment-16626124
 ] 

Bibin A Chundatt edited comment on YARN-8815 at 9/24/18 5:04 PM:
-

[~rohithsharma]/[~sunilg]

During recovery, applicationSubmissionContext doesnt have info regarding app 
tunManaged or not.
If Resource=null & amReqs is empty  then above exception will be thrown from 
{{RMAppManager#validateAndCreateResourceRequest}}

Added testcase to simulate issue at same method.


was (Author: bibinchundatt):
[~rohithsharma]/[~sunilg]

During recovery, applicationSubmissionContext doesnt have info regarding app 
tunManaged or not.
If resourceRequest=null & amReqs is empty  then above exception will be thrown 
from {{RMAppManager#validateAndCreateResourceRequest}}

Added testcase to simulate issue at same method.

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at 

[jira] [Comment Edited] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626124#comment-16626124
 ] 

Bibin A Chundatt edited comment on YARN-8815 at 9/24/18 5:04 PM:
-

[~rohithsharma]/[~sunilg]

During recovery, applicationSubmissionContext doesnt have info regarding app 
type is unManaged or not.
If Resource=null & amReqs is empty  then above exception will be thrown from 
{{RMAppManager#validateAndCreateResourceRequest}}

Added testcase to simulate issue at same method.


was (Author: bibinchundatt):
[~rohithsharma]/[~sunilg]

During recovery, applicationSubmissionContext doesnt have info regarding app 
tunManaged or not.
If Resource=null & amReqs is empty  then above exception will be thrown from 
{{RMAppManager#validateAndCreateResourceRequest}}

Added testcase to simulate issue at same method.

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at 

[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626124#comment-16626124
 ] 

Bibin A Chundatt commented on YARN-8815:


[~rohithsharma]/[~sunilg]

During recovery, applicationSubmissionContext doesnt have info regarding app 
tunManaged or not.
If resourceRequest=null & amReqs is empty  then above exception will be thrown 
from {{RMAppManager#validateAndCreateResourceRequest}}

Added testcase to simulate issue at same method.

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> 

[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8815:
---
Attachment: YARN-8815.001.patch

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
> Attachments: YARN-8815.001.patch
>
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> 

[jira] [Updated] (YARN-8809) Refactor AbstractYarnScheduler and CapacityScheduler OPPORTUNISTIC container completion codepaths

2018-09-24 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8809:
-
Description: When OPPORTUNISTIC containers  are released, fair scheduler 
does not update the queue metrics correctly.

> Refactor AbstractYarnScheduler and CapacityScheduler OPPORTUNISTIC container 
> completion codepaths
> -
>
> Key: YARN-8809
> URL: https://issues.apache.org/jira/browse/YARN-8809
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8809-YARN-1011.00.patch, 
> YARN-8809-YARN-1011.01.patch
>
>
> When OPPORTUNISTIC containers  are released, fair scheduler does not update 
> the queue metrics correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers

2018-09-24 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626084#comment-16626084
 ] 

Haibo Chen commented on YARN-1011:
--

Sure. I am testing my local rebased branch. Will push it once the tests finish 
without failures.

> [Umbrella] Schedule containers based on utilization of currently allocated 
> containers
> -
>
> Key: YARN-1011
> URL: https://issues.apache.org/jira/browse/YARN-1011
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>Assignee: Karthik Kambatla
>Priority: Major
> Attachments: patch-for-yarn-1011.patch, yarn-1011-design-v0.pdf, 
> yarn-1011-design-v1.pdf, yarn-1011-design-v2.pdf, yarn-1011-design-v3.pdf
>
>
> Currently RM allocates containers and assumes resources allocated are 
> utilized.
> RM can, and should, get to a point where it measures utilization of allocated 
> containers and, if appropriate, allocate more (speculative?) containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-09-24 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626081#comment-16626081
 ] 

Haibo Chen commented on YARN-8468:
--

[~cheersyang] The reason why normalization is done in scheduler is that we 
introduced a queue-level configuration here and queues are scheduler-dependent 
concepts.

{{ApplicationMasterProtocol}} is the connection path between AMs and RM for 
non-AM container requests. For AM-container requests, the normalization is done 
by RMAppManager. 

I hope that clarifies your question.

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8811) Support Container Storage Interface (CSI) in YARN

2018-09-24 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626064#comment-16626064
 ] 

Eric Yang commented on YARN-8811:
-

[~cheersyang] {quote}For the comment about object store user API key 
information, I am not sure about this point, could you please elaborate.{quote}

In [Hadoop AWS Integration 
document|https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html],
 we need to specify fs.s3a.access.key and fs.s3a.secret.key to connect to aws 
account.  The same principal applies to swift fs, and other object store.  
Therefore, the CSI specification should include ability to pass the key 
information to connect to object store.  In the current specification, it is 
also missing source storage information.

Propagation options are required to make sure multiple mount of the same source 
storage system can be shared or exclusive mount.  Without this defined, it 
might be troublesome for source storage system to decide the locking mechanism. 
 

> Support Container Storage Interface (CSI) in YARN
> -
>
> Key: YARN-8811
> URL: https://issues.apache.org/jira/browse/YARN-8811
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: Support Container Storage Interface(CSI) in YARN_design 
> doc_20180921.pdf
>
>
> The Container Storage Interface (CSI) is a vendor neutral interface to bridge 
> Container Orchestrators and Storage Providers. With the adoption of CSI in 
> YARN, it will be easier to integrate 3rd party storage systems, and provide 
> the ability to attach persistent volumes for stateful applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626062#comment-16626062
 ] 

Rohith Sharma K S commented on YARN-8815:
-

[~Rakesh_Shah] Could you give bit more details does this happens while 
recovering running unmanaged am or  finished unmanaged app?

[~bibinchundatt] Could you clarify how YARN-5028 breaks?

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> 

[jira] [Comment Edited] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2018-09-24 Thread Rahul Anand (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625870#comment-16625870
 ] 

Rahul Anand edited comment on YARN-7592 at 9/24/18 3:46 PM:


Thanks [~bibinchundatt] and [~subru].

Removing *yarn.federation.enabled* from yarn-site.xml can solve this issue but 
would definitely create a confusion. So, instead of changing/removing a 
meaningful federation flag or updating doc, an alternative solution can be 
creation of a {{FederationCustomClientRMProxy}} which can override the 
{{ClientRMProxy#createRMProxy}} in {{AMRMClientUtils}} to always select *proxy 
provider* as {{FederationRMFailoverProxyProvider}} for federation.
{code:java}
public static  T createRMProxy(final Configuration configuration,
  final Class protocol, UserGroupInformation user,
  final Token token) throws IOException {
 ...
  return FederationCustomClientRMProxy.createRMProxy(configuration, 
protocol);
}
 ...
}
  }
{code}
After this, we can remove the {{isFederationEnabled}} check from the 
{{RMProxy.java}} as before. 
{code:java}
protected static  T createRMProxy(final Configuration configuration,
  final Class protocol, RMProxy instance) throws IOException {
...
RetryPolicy retryPolicy = createRetryPolicy(conf,
(HAUtil.isHAEnabled(conf)));
...
  }
{code}
{code:java}
  protected static  T createRMProxy(final Configuration configuration,
  final Class protocol, RMProxy instance, final long retryTime,
  final long retryInterval) throws IOException {
...
RetryPolicy retryPolicy = createRetryPolicy(conf, retryTime, retryInterval,
HAUtil.isHAEnabled(conf));
...
  }
{code}
With this change we don't need to separately specify the *proxy provider* for 
HA and non-HA scenarios in case of federation while other non federation 
settings will continue as it is.


was (Author: rahulanand90):
Thanks [~bibinchundatt] and [~subru]. 

Removing *yarn.federation.enabled* from yarn-site.xml can solve this issue but 
would definitely create a confusion. So, instead of changing/removing a 
meaningful federation flag or updating doc, an alternative solution can be 
creation of a {{FederationCustomClientRMProxy}} which can override the 
{{ClientRMProxy#createRMProxy}} in {{AMRMClientUtils}} to always select *proxy 
provider* as {{FederationRMFailoverProxyProvider}} for federation.
{code:java}
public static  T createRMProxy(final Configuration configuration,
  final Class protocol, UserGroupInformation user,
  final Token token) throws IOException {
 ...
  return FederationCustomClientRMProxy.createRMProxy(configuration, 
protocol);
}
 ...
}
  }
{code}
After this, we can remove the {{isFederationEnabled}} check from the 
{{RMProxy.java}} as before. 
{code:java}
protected static  T createRMProxy(final Configuration configuration,
  final Class protocol, RMProxy instance) throws IOException {
...
RetryPolicy retryPolicy = createRetryPolicy(conf,
(HAUtil.isHAEnabled(conf)));
...
  }
{code}
{code:java}
  protected static  T createRMProxy(final Configuration configuration,
  final Class protocol, RMProxy instance, final long retryTime,
  final long retryInterval) throws IOException {
...
RetryPolicy retryPolicy = createRetryPolicy(conf, retryTime, retryInterval,
HAUtil.isHAEnabled(conf));
...
  }
{code}
With this change, we don't need to seperately  specify the *proxy provider* for 
HA and non-HA scenarios.

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625938#comment-16625938
 ] 

Hadoop QA commented on YARN-5939:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: The patch generated 1 new + 
90 unchanged - 1 fixed = 91 total (was 91) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
21s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-5939 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941050/YARN-5939.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c9a6bf4aed86 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 32a35dc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21948/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21948/testReport/ |
| Max. process+thread count | 300 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 

[jira] [Updated] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM

2018-09-24 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8816:
--
Description: 
{code}
Linux apache-dev 4.15.0-34-generic #37~16.04.1-Ubuntu SMP Tue Aug 28 10:44:06 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
{code}

{code}
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s 
<<< FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands
[ERROR] 
testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands)
  Time elapsed: 2.668 s  <<< ERROR!
java.lang.ExceptionInInitializerError
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
at 
org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828)
at 
org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593)
at 
org.apache.hadoop.security.SecurityUtil.setTokenServiceUseIp(SecurityUtil.java:129)
at 
org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:102)
at 
org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88)
... 38 more
{code}

  was:
{code}
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s 
<<< FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands
[ERROR] 
testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands)
  Time elapsed: 2.668 s  <<< ERROR!

[jira] [Created] (YARN-8816) YARN Unit Tests Fail with Ubuntu VM

2018-09-24 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created YARN-8816:
-

 Summary: YARN Unit Tests Fail with Ubuntu VM
 Key: YARN-8816
 URL: https://issues.apache.org/jira/browse/YARN-8816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.2.0
Reporter: BELUGA BEHR


{code}
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 3.926 s 
<<< FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands
[ERROR] 
testRemoveApplicationFromStateStoreCmdForZK(org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands)
  Time elapsed: 2.668 s  <<< ERROR!
java.lang.ExceptionInInitializerError
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:316)
at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:304)
at 
org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1828)
at 
org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:286)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1381)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:164)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:143)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:139)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands.testRemoveApplicationFromStateStoreCmdForZK(TestRMStoreCommands.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.security.SecurityUtil$QualifiedHostResolver.(SecurityUtil.java:593)
at 
org.apache.hadoop.security.SecurityUtil.setTokenServiceUseIp(SecurityUtil.java:129)
at 
org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:102)
at 
org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:88)
... 38 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2018-09-24 Thread Rahul Anand (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625870#comment-16625870
 ] 

Rahul Anand commented on YARN-7592:
---

Thanks [~bibinchundatt] and [~subru]. 

Removing *yarn.federation.enabled* from yarn-site.xml can solve this issue but 
would definitely create a confusion. So, instead of changing/removing a 
meaningful federation flag or updating doc, an alternative solution can be 
creation of a {{FederationCustomClientRMProxy}} which can override the 
{{ClientRMProxy#createRMProxy}} in {{AMRMClientUtils}} to always select *proxy 
provider* as {{FederationRMFailoverProxyProvider}} for federation.
{code:java}
public static  T createRMProxy(final Configuration configuration,
  final Class protocol, UserGroupInformation user,
  final Token token) throws IOException {
 ...
  return FederationCustomClientRMProxy.createRMProxy(configuration, 
protocol);
}
 ...
}
  }
{code}
After this, we can remove the {{isFederationEnabled}} check from the 
{{RMProxy.java}} as before. 
{code:java}
protected static  T createRMProxy(final Configuration configuration,
  final Class protocol, RMProxy instance) throws IOException {
...
RetryPolicy retryPolicy = createRetryPolicy(conf,
(HAUtil.isHAEnabled(conf)));
...
  }
{code}
{code:java}
  protected static  T createRMProxy(final Configuration configuration,
  final Class protocol, RMProxy instance, final long retryTime,
  final long retryInterval) throws IOException {
...
RetryPolicy retryPolicy = createRetryPolicy(conf, retryTime, retryInterval,
HAUtil.isHAEnabled(conf));
...
  }
{code}
With this change, we don't need to seperately  specify the *proxy provider* for 
HA and non-HA scenarios.

> yarn.federation.failover.enabled missing in yarn-default.xml
> 
>
> Key: YARN-7592
> URL: https://issues.apache.org/jira/browse/YARN-7592
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0-beta1
>Reporter: Gera Shegalov
>Priority: Major
> Attachments: IssueReproduce.patch
>
>
> yarn.federation.failover.enabled should be documented in yarn-default.xml. I 
> am also not sure why it should be true by default and force the HA retry 
> policy in {{RMProxy#createRMProxy}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5939) FSDownload leaks FileSystem resources

2018-09-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625853#comment-16625853
 ] 

Antal Bálint Steinbach commented on YARN-5939:
--

Hi [~cheersyang] ,

I saw this ticket is open for a while. I rebased the patch to apply to the 
current trunk. I was wondering how your test works? How to use the wrapper 
class easily?

> FSDownload leaks FileSystem resources
> -
>
> Key: YARN-5939
> URL: https://issues.apache.org/jira/browse/YARN-5939
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1, 2.7.3
>Reporter: liuxiangwei
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: leak
> Attachments: YARN-5939.004.patch, YARN-5939.01.patch, 
> YARN-5939.02.patch, YARN-5939.03.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Background
> To use our self-defined FileSystem class, the item of configuration 
> "fs.%s.impl.disable.cache" should set to true.
> In YARN's source code, the class named 
> "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, 
> which leading to file descriptor leak because our self-defined FileSystem 
> class close the file descriptor when the close function is invoked.
> My Question below:
> 1. whether invoking "getFileSystem" but never close is YARN's expected 
> behavior 
> 2. what should we do in our self-defined FileSystem resolve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5939) FSDownload leaks FileSystem resources

2018-09-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-5939:
-
Attachment: YARN-5939.004.patch

> FSDownload leaks FileSystem resources
> -
>
> Key: YARN-5939
> URL: https://issues.apache.org/jira/browse/YARN-5939
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1, 2.7.3
>Reporter: liuxiangwei
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: leak
> Attachments: YARN-5939.004.patch, YARN-5939.01.patch, 
> YARN-5939.02.patch, YARN-5939.03.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Background
> To use our self-defined FileSystem class, the item of configuration 
> "fs.%s.impl.disable.cache" should set to true.
> In YARN's source code, the class named 
> "org.apache.hadoop.yarn.util.FSDownload" use getFileSystem but never close, 
> which leading to file descriptor leak because our self-defined FileSystem 
> class close the file descriptor when the close function is invoked.
> My Question below:
> 1. whether invoking "getFileSystem" but never close is YARN's expected 
> behavior 
> 2. what should we do in our self-defined FileSystem resolve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-09-24 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625814#comment-16625814
 ] 

Sunil Govindan commented on YARN-8657:
--

{code:java}
}
finally
{
   readLock.unlock();  
}{code}
We use the same now in this new method {{canAssignToUserWithCache}} , correct ?
  

> User limit calculation should be read-lock-protected within LeafQueue
> -
>
> Key: YARN-8657
> URL: https://issues.apache.org/jira/browse/YARN-8657
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8657.001.patch, YARN-8657.002.patch
>
>
> When async scheduling is enabled, user limit calculation could be wrong: 
> It is possible that scheduler calculated a user_limit, but inside 
> {{canAssignToUser}} it becomes staled. 
> We need to protect user limit calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625793#comment-16625793
 ] 

Sunil Govindan commented on YARN-8815:
--

Thanks [~Rakesh_Shah] and [~bibinchundatt]

I just saw this issue is marked for 3.2.0. Do we have a solution for this? If 
so, pls help to share the patch. From the description, this looks like a 
problem to me. Thanks.

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> 

[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-09-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625725#comment-16625725
 ] 

Antal Bálint Steinbach commented on YARN-8657:
--

Hi [~leftnoteasy] , 

Thanks for the patch.

I ran into a very small issue while reading your patch.

In line 1531
{code:java}
try {
 readLock.lock();{code}
it is a good pattern to do it like:
{code:java}
readLock.lock();
try {...}
finally { readLock.unlock(); }
{code}
There are some threads around this on Stackoverflow. For example 
[https://stackoverflow.com/questions/31058681/java-locking-structure-best-pattern|http://example.com/]

There are some more examples on this in the file, I just wanted to raise this 
while you did some modification around this.

> User limit calculation should be read-lock-protected within LeafQueue
> -
>
> Key: YARN-8657
> URL: https://issues.apache.org/jira/browse/YARN-8657
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8657.001.patch, YARN-8657.002.patch
>
>
> When async scheduling is enabled, user limit calculation could be wrong: 
> It is possible that scheduler calculated a user_limit, but inside 
> {{canAssignToUser}} it becomes staled. 
> We need to protect user limit calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625710#comment-16625710
 ] 

Hadoop QA commented on YARN-8468:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
30s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 38s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 8 new + 879 unchanged - 23 fixed = 887 total (was 902) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 
24s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}172m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8468 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941024/YARN-8468.017.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  

[jira] [Commented] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625672#comment-16625672
 ] 

Bibin A Chundatt commented on YARN-8815:


Thank you [~Rakesh_Shah] for raising the issue

Seems related to YARN-5028. Once the application is in FINAL state 
{{pruneAppState(ApplicationStateData appState)}} doesnt set the application is 
managed or unmanaged.





> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> 

[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8815:
---
Priority: Critical  (was: Major)

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> 

[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8815:
---
Target Version/s: 3.2.0

> Both RM in standby after restart(restart failure)
> -
>
> Key: YARN-8815
> URL: https://issues.apache.org/jira/browse/YARN-8815
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Critical
>
>  
> *while running a un managed am jar and restarting the RM - RM goes into 
> standby*
> *Below is the exception trace--*
> {noformat}
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> 2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, no resources requested
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
>  at 
> 

[jira] [Updated] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8815:
---
Description: 
 

*while running a un managed am jar and restarting the RM - RM goes into standby*

*Below is the exception trace--*
{noformat}
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, no resources requested
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
 at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
 at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: Service 
RMActiveServices failed in state STARTED
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, no resources requested
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
 at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
 at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
{noformat}

  was:
 

*while running a un managed am jar and restarting the RM - RM goes into standby*

*Below is the exception trace--*

 

 

 

org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, no resources requested
 at 

[jira] [Created] (YARN-8815) Both RM in standby after restart(restart failure)

2018-09-24 Thread Rakesh Shah (JIRA)
Rakesh Shah created YARN-8815:
-

 Summary: Both RM in standby after restart(restart failure)
 Key: YARN-8815
 URL: https://issues.apache.org/jira/browse/YARN-8815
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Rakesh Shah


 

*while running a un managed am jar and restarting the RM - RM goes into standby*

*Below is the exception trace--*

 

 

 

org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, no resources requested
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
 at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
 at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
2018-09-24 18:48:23,855 INFO org.apache.hadoop.service.AbstractService: Service 
RMActiveServices failed in state STARTED
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, no resources requested
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:510)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:359)
 at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:589)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1483)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:848)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1224)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1261)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1261)
 at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
 at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
 at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-09-24 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625623#comment-16625623
 ] 

Sunil Govindan commented on YARN-8657:
--

[~cheersyang] cud u pls check latest patch

> User limit calculation should be read-lock-protected within LeafQueue
> -
>
> Key: YARN-8657
> URL: https://issues.apache.org/jira/browse/YARN-8657
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8657.001.patch, YARN-8657.002.patch
>
>
> When async scheduling is enabled, user limit calculation could be wrong: 
> It is possible that scheduler calculated a user_limit, but inside 
> {{canAssignToUser}} it becomes staled. 
> We need to protect user limit calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625616#comment-16625616
 ] 

Hadoop QA commented on YARN-8657:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 32s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 42 unchanged - 2 fixed = 43 total (was 44) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 16s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerMultiNodes
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8657 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941021/YARN-8657.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5f0e73189a29 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 32a35dc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-7957) [UI2] Yarn service delete option disappears after stopping application

2018-09-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625577#comment-16625577
 ] 

Hadoop QA commented on YARN-7957:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
27m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-7957 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941023/YARN-7957.002.patch |
| Optional Tests |  dupname  asflicense  shadedclient  |
| uname | Linux ccc4b8c7a09e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 32a35dc |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 406 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21946/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [UI2] Yarn service delete option disappears after stopping application
> --
>
> Key: YARN-7957
> URL: https://issues.apache.org/jira/browse/YARN-7957
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Akhil PB
>Priority: Critical
> Attachments: YARN-7957.001.patch, YARN-7957.002.patch
>
>
> Steps:
> 1) Launch yarn service
> 2) Go to service page and click on Setting button->"Stop Service". The 
> application will be stopped.
> 3) Refresh page
> Here, setting button disappears. Thus, user can not delete service from UI 
> after stopping application
> Expected behavior:
> Setting button should be present on UI page after application is stopped. If 
> application is stopped, setting button should only have "Delete Service" 
> action available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-09-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625559#comment-16625559
 ] 

Antal Bálint Steinbach commented on YARN-8468:
--

Hi [~cheersyang] ,

Thanks for your additional suggestions.

1) Fixed - very good point thx

2) Fixed

Further checkstyle issues fixed. Quite hard to find these manually as I cannot 
use auto-format or import organization.

As for your question on the normalization topic. Maybe [~haibochen] can give 
some more details on this because he is a bit more deeper in FS.

 

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8468) Enable the use of queue based maximum container allocation limit and implement it in FairScheduler

2018-09-24 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8468:
-
Attachment: YARN-8468.017.patch

> Enable the use of queue based maximum container allocation limit and 
> implement it in FairScheduler
> --
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch, 
> YARN-8468.014.patch, YARN-8468.015.patch, YARN-8468.016.patch, 
> YARN-8468.017.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
> The goal of this ticket is to allow this value to be set on a per queue basis.
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
> Suggested solution:
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability(String queueName) in both 
> FSParentQueue and FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * Enforce the use of queue based maximum allocation limit if it is 
> available, if not use the general scheduler level setting
>  ** Use it during validation and normalization of requests in 
> scheduler.allocate, app submit and resource request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7957) [UI2] Yarn service delete option disappears after stopping application

2018-09-24 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625539#comment-16625539
 ] 

Akhil PB commented on YARN-7957:


Thanks [~sunilg] for your comments. Attached v2 patch with the above changes.

> [UI2] Yarn service delete option disappears after stopping application
> --
>
> Key: YARN-7957
> URL: https://issues.apache.org/jira/browse/YARN-7957
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Akhil PB
>Priority: Critical
> Attachments: YARN-7957.001.patch, YARN-7957.002.patch
>
>
> Steps:
> 1) Launch yarn service
> 2) Go to service page and click on Setting button->"Stop Service". The 
> application will be stopped.
> 3) Refresh page
> Here, setting button disappears. Thus, user can not delete service from UI 
> after stopping application
> Expected behavior:
> Setting button should be present on UI page after application is stopped. If 
> application is stopped, setting button should only have "Delete Service" 
> action available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7957) [UI2] Yarn service delete option disappears after stopping application

2018-09-24 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-7957:
---
Attachment: YARN-7957.002.patch

> [UI2] Yarn service delete option disappears after stopping application
> --
>
> Key: YARN-7957
> URL: https://issues.apache.org/jira/browse/YARN-7957
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Akhil PB
>Priority: Critical
> Attachments: YARN-7957.001.patch, YARN-7957.002.patch
>
>
> Steps:
> 1) Launch yarn service
> 2) Go to service page and click on Setting button->"Stop Service". The 
> application will be stopped.
> 3) Refresh page
> Here, setting button disappears. Thus, user can not delete service from UI 
> after stopping application
> Expected behavior:
> Setting button should be present on UI page after application is stopped. If 
> application is stopped, setting button should only have "Delete Service" 
> action available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >