date:20190430

[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities

2019-04-30 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830859#comment-16830859
 ] 

Hadoop QA commented on YARN-9440:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
33s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 34m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
31m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
7s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
40s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 5 new + 189 unchanged - 19 fixed = 194 total (was 208) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
1s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m  0s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}217m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9440 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967542/YARN-9440.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 26fbf1efdd6b 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personal

[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-04-30 Thread Prabhu Joseph (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830815#comment-16830815
 ] 

Prabhu Joseph commented on YARN-6929:
-

Thanks [~eyang].

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, 
> YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, 
> YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
>  //logs//
> {code:java}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>

[jira] [Updated] (YARN-9507) Fix NPE in NodeManager#serviceStop on startup failure

2019-04-30 Thread Bibin A Chundatt (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9507:
---
Summary: Fix NPE in NodeManager#serviceStop on startup failure  (was: Fix 
NPE if NM fails to init)

> Fix NPE in NodeManager#serviceStop on startup failure
> -
>
> Key: YARN-9507
> URL: https://issues.apache.org/jira/browse/YARN-9507
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-9507-001.patch
>
>
> 2019-04-24 14:06:44,101 WARN org.apache.hadoop.service.AbstractService: When 
> stopping the service NodeManager
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:492)
>  at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>  at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>  at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:947)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1018)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities

2019-04-30 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9440:
---
Attachment: YARN-9440.003.patch

> Improve diagnostics for scheduler and app activities
> 
>
> Key: YARN-9440
> URL: https://issues.apache.org/jira/browse/YARN-9440
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9440.001.patch, YARN-9440.002.patch, 
> YARN-9440.003.patch
>
>
> [Design doc 
> #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities

2019-04-30 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9440:
---
Attachment: (was: YARN-9440.003.patch)

> Improve diagnostics for scheduler and app activities
> 
>
> Key: YARN-9440
> URL: https://issues.apache.org/jira/browse/YARN-9440
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9440.001.patch, YARN-9440.002.patch, 
> YARN-9440.003.patch
>
>
> [Design doc 
> #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-30 Thread Yufei Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830769#comment-16830769
 ] 

Yufei Gu commented on YARN-9520:


* inter-queue preemption will not happen among the applications of inside the 
queue.
Yes.
* FIFO ordering policy the newer applications will preempted first if the 
priority is same or not set. In other words, the older applications will 
considered for preemption only after the newer applications are preempted.
No.  Only the oldest one has less chance to be preempted. All others have the 
same chance.
* multiple applications of a queue will run if resources are available. lets 
say there are resources for 200 containers, 2 applications of 100 containers 
will run. after 50 containers of each finished does the 3rd containers will get 
allocated? or it will wait for first 2 applications will finish?
Yes.  The 3rd one can run. 

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-04-30 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830714#comment-16830714
 ] 

Hudson commented on YARN-6929:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16482 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16482/])
YARN-6929.  Improved partition algorithm for yarn remote-app-log-dir.
(eyang: rev accb811e5727f2a780a41cd5e50bab47a0cccb68)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestContainerLogsUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileControllerFactory.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogDeletionService.java


> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, 
> YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, 
> YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
>  //logs//
> {code:java}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.

[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-04-30 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830687#comment-16830687
 ] 

Eric Yang commented on YARN-6929:
-

+1 Patch 11 looks good to me.

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, 
> YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, 
> YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
>  //logs//
> {code:java}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop

[jira] [Comment Edited] (YARN-9523) Build application catalog docker image as part of hadoop dist build

2019-04-30 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830658#comment-16830658
 ] 

Eric Yang edited comment on YARN-9523 at 4/30/19 8:21 PM:
--

[~jeagles], [~ebadger] you are welcome to chime in to HDDS-1458.  I have 
intention to add --privileged as a option to start-build-env.sh and add 
privileged requirement for integration test with profile activation.  You are 
welcome to add your feedback to ensure that those patches are done with 
community consensus.

{quote}If there is a relevant discussion thread, please reference it 
here.{quote}

The plan is in HADOOP-16091, and put on hold until we process the outcome of 
previous iteration.  Conversation are in Cloudera internal slack and web 
meetings.  The idea is only moving forward after we think through our own 
thought process and current obstacles. 


was (Author: eyang):
[~jeagles], [~ebadger] you are welcome to chime in to HDDS-1458.  I have 
intention to add --privileged as a option to start-build-env.sh and add 
privileged requirement for integration test with profile activation.  You are 
welcome to add your feedback to ensure that those patches are done with 
community consensus.  

> Build application catalog docker image as part of hadoop dist build
> ---
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9523) Build application catalog docker image as part of hadoop dist build

2019-04-30 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9523:

Summary: Build application catalog docker image as part of hadoop dist 
build  (was: Rename docker profile to dist profile)

> Build application catalog docker image as part of hadoop dist build
> ---
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9523) Rename docker profile to dist profile

2019-04-30 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830658#comment-16830658
 ] 

Eric Yang commented on YARN-9523:
-

[~jeagles], [~ebadger] you are welcome to chime in to HDDS-1458.  I have 
intention to add --privileged as a option to start-build-env.sh and add 
privileged requirement for integration test with profile activation.  You are 
welcome to add your feedback to ensure that those patches are done with 
community consensus.  

> Rename docker profile to dist profile
> -
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9523) Rename docker profile to dist profile

2019-04-30 Thread Jonathan Eagles (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830586#comment-16830586
 ] 

Jonathan Eagles commented on YARN-9523:
---

bq. please work on changing the JIRA summary to indicate this is not a simple 
renaming, but a large change in the build process

[~eyang], please address the JIRA title. It is very misleading what this jira 
is about.

bq.  I am working on Ozone, and the same discussion has surfaced in Ozone 
community.  The suggestion came up to use dist profile to build docker image.

If there is a relevant discussion thread, please reference it here.

> Rename docker profile to dist profile
> -
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9523) Rename docker profile to dist profile

2019-04-30 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-9523:
---

Assignee: Eric Yang

> Rename docker profile to dist profile
> -
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9523) Rename docker profile to dist profile

2019-04-30 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830578#comment-16830578
 ] 

Eric Yang commented on YARN-9523:
-

[~jeagles] With all due respect, this issue will not be committed unless there 
is consensus.  I am working on Ozone, and the same discussion has surfaced in 
Ozone community.  The suggestion came up to use dist profile to build docker 
image.  It seems like a reasonable change to ensure that Hadoop community can 
build distro that includes docker images.  I like to bring this up to bi-weekly 
YARN docker meeting for discussion again because multiple subprojects faces 
similar problem.  It's best to solve this problem in consistent manner instead 
of having Ozone project build in kubernetes, HDFS in docker-compose, and YARN 
in docker.  Without consistency, it impacts techniques that are used for 
integration tests.  It would be best to talk about this option upfront.

> Rename docker profile to dist profile
> -
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9523) Rename docker profile to dist profile

2019-04-30 Thread Jonathan Eagles (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830545#comment-16830545
 ] 

Jonathan Eagles commented on YARN-9523:
---

Pointing out the highly relevant community discussion that took place that 
illustrates at that time, a majority not in favor of enabling docker image 
creation by default.
https://lists.apache.org/thread.html/c63f404bc44f8f249cbc98ee3f6633384900d07e2308008fe4620150@%3Ccommon-dev.hadoop.apache.org%3E

[~eyang] please work on changing the JIRA summary to indicate this is not a 
simple renaming, but a large change in the build process. By having a 
misleading JIRA summary, it may prevent community discussion and fail to gain 
the notice of interested community stakeholders.

> Rename docker profile to dist profile
> -
>
> Key: YARN-9523
> URL: https://issues.apache.org/jira/browse/YARN-9523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>
> It would be nice to make Application catalog docker image as part of the 
> distribution.  The suggestion is to change from:
> {code:java}
> mvn clean package -Pnative,dist,docker{code}
> to
> {code:java}
> mvn clean package -Pnative,dist{code}
> User can still build tarball only using:
> {code:java}
> mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9523) Rename docker profile to dist profile

2019-04-30 Thread Eric Yang (JIRA)

Eric Yang created YARN-9523:
---

 Summary: Rename docker profile to dist profile
 Key: YARN-9523
 URL: https://issues.apache.org/jira/browse/YARN-9523
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Eric Yang


It would be nice to make Application catalog docker image as part of the 
distribution.  The suggestion is to change from:
{code:java}
mvn clean package -Pnative,dist,docker{code}
to
{code:java}
mvn clean package -Pnative,dist{code}
User can still build tarball only using:
{code:java}
mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities

2019-04-30 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830417#comment-16830417
 ] 

Hadoop QA commented on YARN-9440:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  3s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 5 new + 189 unchanged - 19 fixed = 194 total (was 208) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
42s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 47s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9440 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967487/YARN-9440.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c5fb9b424c3b 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / dead9b4 |
| maven | version:

[jira] [Comment Edited] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-30 Thread Sudhir Babu Pothineni (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830398#comment-16830398
 ] 

Sudhir Babu Pothineni edited comment on YARN-9520 at 4/30/19 3:29 PM:
--

Thanks [~yufeigu] didn't thought about FIFO policy, will try it. could you 
please clarify if following with FIFO policy preemption are true?
 * inter-queue preemption will not happen among the applications of inside the 
queue.
 * FIFO ordering policy the newer applications will preempted first if the 
priority is same or not set. In other words, the older applications will 
considered for preemption only after the newer applications are preempted.
 * multiple applications of a queue will run if resources are available. lets 
say there are resources for 200 containers, 2 applications of 100 containers 
will run. after 50 containers of each finished does the 3rd containers will get 
allocated? or it will wait for first 2 applications will finish?


was (Author: sbpothineni):
Thanks [~yufeigu] didn't thought about FIFO policy, will try it. could you 
please clarify if following with FIFO preemption are true?
 * inter-queue preemption will not happen among the applications of inside the 
queue.
 * FIFO ordering policy the newer applications will preempted first if the 
priority is same or not set. In other words, the older applications will 
considered for preemption only after the newer applications are preempted.
 * multiple applications of a queue will run if resources are available. lets 
say there are resources for 200 containers, 2 applications of 100 containers 
will run. after 50 containers of each finished does the 3rd containers will get 
allocated? or it will wait for first 2 applications will finish?

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options

2019-04-30 Thread Sudhir Babu Pothineni (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830398#comment-16830398
 ] 

Sudhir Babu Pothineni commented on YARN-9520:
-

Thanks [~yufeigu] didn't thought about FIFO policy, will try it. could you 
please clarify if following with FIFO preemption are true?
 * inter-queue preemption will not happen among the applications of inside the 
queue.
 * FIFO ordering policy the newer applications will preempted first if the 
priority is same or not set. In other words, the older applications will 
considered for preemption only after the newer applications are preempted.
 * multiple applications of a queue will run if resources are available. lets 
say there are resources for 200 containers, 2 applications of 100 containers 
will run. after 50 containers of each finished does the 3rd containers will get 
allocated? or it will wait for first 2 applications will finish?

> fair scheduler: inter-queue-preemption.enabled, 
> intra-queue-preemption.enabled options
> --
>
> Key: YARN-9520
> URL: https://issues.apache.org/jira/browse/YARN-9520
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Sudhir Babu Pothineni
>Priority: Major
>
> Its good to have  inter-queue-preemption-enabled, 
> intra-queue-preemption-enabled options for fair scheduler, i have a use case 
> where we need inter-queue-preemption-enabled=false



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830347#comment-16830347
 ] 

Jim Brennan commented on YARN-9518:
---

[~shurong.mai], your patch needs to be based on branch trunk.   I tried 
applying your branch to my local version of trunk, and it does not apply.   See 
[https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute]

Also see the patch naming convention - it needs to be something like: 
YARN-9518.001.patch to be picked up by the automated tests.

I was not suggesting that this issue was fixed by YARN-5301 - there have been a 
few other changes since then.  It just looks like your current patch is based 
on code from before YARN-5301.

 

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7. 
> {code:java}
> cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
> {code}
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
>

[jira] [Commented] (YARN-9477) Implement VE discovery using libudev

2019-04-30 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830306#comment-16830306
 ] 

Szilard Nemeth commented on YARN-9477:
--

Thanks [~pbacsko] for the quick update!
The latest patch of this POC looks good to me, I think you can move forward 
with testing.


> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch, 
> YARN-9477-POC3.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9477) Implement VE discovery using libudev

2019-04-30 Thread Peter Bacsko (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830297#comment-16830297
 ] 

Peter Bacsko commented on YARN-9477:


Thanks [~snemeth] for the comments.

_In VEDeviceDiscoverer#getDeviceState: Please use a more descriptive and 
detailed error message rather than "Unknown "_

That string was taken from the Python script given to us by NEC and will not be 
printed as an error message, so I think it's good.

_Could you please add some testcases for VEDeviceDiscoverer?_

Yes, once this POC is considered to be good, I'll upload them as non-POC.

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch, 
> YARN-9477-POC3.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9477) Implement VE discovery using libudev

2019-04-30 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9477:
---
Attachment: YARN-9477-POC3.patch

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch, 
> YARN-9477-POC3.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9477) Implement VE discovery using libudev

2019-04-30 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830272#comment-16830272
 ] 

Szilard Nemeth commented on YARN-9477:
--

Hi [~pbacsko]!
Some comments:

1. In UdevUtil#getSysPath: the variable sysPath is redundant, you can return it 
immediately without declaring it. 
2. Similarly in VEDeviceDiscoverer#getDevicesFromPath, you can return the value 
at the end of the method immediately.
3. After the closing curly bracket of UdevUtil.LibUdev#init, there's an 
additional semicolon, please remove it!
4. In VEDeviceDiscoverer#getDeviceState: Please use a more descriptive and 
detailed error message rather than "Unknown ".
5. In VEDeviceDiscoverer: You use the String "ONLINE" both in the DEVICE_STATE 
array and on its own in toDevice method. Please define a static final String 
constant for this one, at least. I think the rest of the string values in 
DEVICE_STATE can remain intact.
6. Could you please add some testcases for VEDeviceDiscoverer? I think you 
should use a temporary directory for test files and you should test if the 
devices are parsed correctly (toDevice method).

I can take another look once you fixed these above.
Thanks!

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities

2019-04-30 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9440:
---
Attachment: YARN-9440.003.patch

> Improve diagnostics for scheduler and app activities
> 
>
> Key: YARN-9440
> URL: https://issues.apache.org/jira/browse/YARN-9440
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9440.001.patch, YARN-9440.002.patch, 
> YARN-9440.003.patch
>
>
> [Design doc 
> #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities

2019-04-30 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830264#comment-16830264
 ] 

Tao Yang commented on YARN-9440:


Thanks [~cheersyang] for some advices when we talked on the phone.

Attached v3 patch to refactor DiagnosticsCollector interface and replace its 
implements(ResourceDiagnosticsCollector/PlacementConstraintsDiagnosticsCollector)
 with GenericDiagnosticsCollector. Move collection logic of resource 
diagnostics outside of the ResourceCalculator to not affect basic classes but 
keep passing optional collector into the calling stack of PC verification to 
avoid performance regression via collecting diagnostics only when necessary 
(activities enabled for current node/app and PC/partition mismatched).

> Improve diagnostics for scheduler and app activities
> 
>
> Key: YARN-9440
> URL: https://issues.apache.org/jira/browse/YARN-9440
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9440.001.patch, YARN-9440.002.patch
>
>
> [Design doc 
> #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9477) Implement VE discovery using libudev

2019-04-30 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9477:
---
Attachment: YARN-9477-POC2.patch

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7. 
{code:java}
cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
{code}
When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated by container-executor as 
"/sys/fs/cgroup/cpu" rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is universally applicable to cgroup subsystem paths, such as cgroup 
network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
 

 

##
{panel:title=exceptional nodemanager logs:}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.She

[jira] [Commented] (YARN-9521) RM filed to start due to system services

2019-04-30 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830174#comment-16830174
 ] 

Hadoop QA commented on YARN-9521:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api:
 The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
47s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9521 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967474/YARN-9521.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 55cd210c9e78 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7fbaa7d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24032/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-api.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24032/te

[jira] [Updated] (YARN-9522) AppBlock ignores full qualified class name of PseudoAuthenticationHandler

2019-04-30 Thread Prabhu Joseph (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9522:

Affects Version/s: 3.2.0

> AppBlock ignores full qualified class name of PseudoAuthenticationHandler
> -
>
> Key: YARN-9522
> URL: https://issues.apache.org/jira/browse/YARN-9522
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> {{AuthenticationHandler}} can be either configured using fqcn or type. 
> {{AppBlock}} checks for only the type simple and ignores the fqcn of 
> {{PseudoAuthenticationHandler}} while checking whether ui is secured or not.
> {code}
>* @param authHandler The short-name (or fully qualified class name) of the
>* authentication handler.
> {code}
> *AppBlock.java*
> {code}
> // check if UI is unsecured.
> String httpAuth = 
> conf.get(CommonConfigurationKeys.HADOOP_HTTP_AUTHENTICATION_TYPE);
> this.unsecuredUI = (httpAuth != null) && httpAuth.equals("simple");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9522) AppBlock ignores full qualified class name of PseudoAuthenticationHandler

2019-04-30 Thread Prabhu Joseph (JIRA)

Prabhu Joseph created YARN-9522:
---

 Summary: AppBlock ignores full qualified class name of 
PseudoAuthenticationHandler
 Key: YARN-9522
 URL: https://issues.apache.org/jira/browse/YARN-9522
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


{{AuthenticationHandler}} can be either configured using fqcn or type. 
{{AppBlock}} checks for only the type simple and ignores the fqcn of 
{{PseudoAuthenticationHandler}} while checking whether ui is secured or not.

{code}
   * @param authHandler The short-name (or fully qualified class name) of the
   * authentication handler.
{code}

*AppBlock.java*
{code}
// check if UI is unsecured.
String httpAuth = 
conf.get(CommonConfigurationKeys.HADOOP_HTTP_AUTHENTICATION_TYPE);
this.unsecuredUI = (httpAuth != null) && httpAuth.equals("simple");
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830140#comment-16830140
 ] 

Shurong Mai commented on YARN-9518:
---

[~Jim_Brennan],  I have read YARN-5301 and the patch, I don't think it is the 
same problem. 

YARN-5301 is about -mount-cgroups fail if enable auto mount cgroup, while this 
issue is about  resource description arguments of container-executor  which 
cause the cgroup path truncated because of comma in path 
"/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks".

Therefore, this issue is another problem which is different from YARN-5301.

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated by container-executor as 
> "/sys/fs/cgroup/cpu" rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Description: 
The os version is centos7.

 

When I had set configuration variables  for cgroup with yarn, nodemanager could 
be start without any matter. But when I ran a job, the job failed with these 
exceptional nodemanager logs in the end.

In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
"cpuacct" subsystem are as follows: 
{code:java}
/sys/fs/cgroup/cpu
/sys/fs/cgroup/cpuacct
{code}
But in centos7, as follows:
{code:java}
/sys/fs/cgroup/cpu -> cpu,cpuacct
/sys/fs/cgroup/cpuacct -> cpu,cpuacct
/sys/fs/cgroup/cpu,cpuacct{code}
"cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
symbol links. 

As I look at source code, nodemamager get the cgroup subsystem info by reading 
/proc/mounts. So It get the cpu and cpuacct subsystem path are also 
"/sys/fs/cgroup/cpu,cpuacct". 

The resource description arguments of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
There is a comma in the cgroup path, but the comma is separator of multi 
resource. Therefore, the cgroup path is truncated by container-executor as 
"/sys/fs/cgroup/cpu" rather than correct cgroup path " 
/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
 " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
node manager - Is a directory "

Hence I modify the source code and submit a patch. The idea of patch is that 
nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
"/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description arguments 
of container-executor is such as follows: 
{code:java}
cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
{code}
Note that there is no comma in the path, and is a valid path because 
"/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 

After applied the patch, the problem is resolved and the job can run 
successfully.

The patch is universally applicable to cgroup subsystem paths, such as cgroup 
network subsystem as follows:  
{code:java}
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio
/sys/fs/cgroup/net_cls,net_prio{code}
 

 

##
{panel:title=exceptional nodemanager logs:}
2019-04-19 20:17:20,095 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
to RUNNING
 2019-04-19 20:17:20,101 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
from container container_1554210318404_0042_01_01 is : 27
 2019-04-19 20:17:20,103 WARN 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
from container-launch with container ID: container_155421031840
 4_0042_01_01 and exit code: 27
 ExitCodeException exitCode=27:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 at org.apache.hadoop.util.Shell.run(Shell.java:482)
 at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
 at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1554210318404_0042_01_01
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=27:
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:   at 
org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
 2019-04-19 20:17:20,108 INFO 
org.apache.hadoo

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830132#comment-16830132
 ] 

Hadoop QA commented on YARN-9518:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} YARN-9518 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9518 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967465/YARN-9518.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24031/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776

[jira] [Updated] (YARN-9521) RM filed to start due to system services

2019-04-30 Thread kyungwan nam (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-9521:
---
Attachment: YARN-9521.001.patch

> RM filed to start due to system services
> 
>
> Key: YARN-9521
> URL: https://issues.apache.org/jira/browse/YARN-9521
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-9521.001.patch
>
>
> when starting RM, listing system services directory has failed as follows.
> {code}
> 2019-04-30 17:18:25,441 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory 
> is configured to /services
> 2019-04-30 17:18:25,467 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation 
> initialized to yarn (auth:SIMPLE)
> 2019-04-30 17:18:25,467 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in 
> state STARTED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501)
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> ... 13 more
> {code}
> it looks like due to the usage of filesystem cache.
> this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to 
> yarn-site



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-9521) RM filed to start due to system services

2019-04-30 Thread kyungwan nam (JIRA)

kyungwan nam created YARN-9521:
--

 Summary: RM filed to start due to system services
 Key: YARN-9521
 URL: https://issues.apache.org/jira/browse/YARN-9521
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.2
Reporter: kyungwan nam


when starting RM, listing system services directory has failed as follows.

{code}
2019-04-30 17:18:25,441 INFO  client.SystemServiceManagerImpl 
(SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory is 
configured to /services
2019-04-30 17:18:25,467 INFO  client.SystemServiceManagerImpl 
(SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation 
initialized to yarn (auth:SIMPLE)
2019-04-30 17:18:25,467 INFO  service.AbstractService 
(AbstractService.java:noteFailure(267)) - Service ResourceManager failed in 
state STARTED
org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
Filesystem closed
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501)
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187)
at 
org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375)
at 
org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282)
at 
org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
... 13 more
{code}

it looks like due to the usage of filesystem cache.
this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to 
yarn-site




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830103#comment-16830103
 ] 

Shurong Mai edited comment on YARN-9518 at 4/30/19 9:44 AM:


[~Jim_Brennan], I have read the source code about these in version 2.7.7, 
2.8.5, 2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with 
comma as "/sys/fs/cgroup/cpu,cpuacct".


was (Author: shurong.mai):
[~Jim_Brennan], I have red the source code about these in version 2.7.7, 2.8.5, 
2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma 
as "/sys/fs/cgroup/cpu,cpuacct".

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830103#comment-16830103
 ] 

Shurong Mai commented on YARN-9518:
---

[~Jim_Brennan], I have red the source code about these in version 2.7.7, 2.8.5, 
2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma 
as "/sys/fs/cgroup/cpu,cpuacct".

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolEx

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Attachment: YARN-9518.patch

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> contain

[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shurong Mai updated YARN-9518:
--
Attachment: (was: YARN-9518.patch)

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  2019-04-19 20:17:20,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
> container-launch.
>  2019-04-19 20:17

[jira] [Updated] (YARN-9475) Create basic VE plugin

2019-04-30 Thread Peter Bacsko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9475:
---
Fix Version/s: 3.3.0

> Create basic VE plugin
> --
>
> Key: YARN-9475
> URL: https://issues.apache.org/jira/browse/YARN-9475
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: 3.3.0
> Fix For: 3.3.0
>
> Attachments: YARN-9475-001.patch, YARN-9475-002.patch, 
> YARN-9475-003.patch, YARN-9475-004.patch, YARN-9475-005.patch, 
> YARN-9475-006.patch, YARN-9475-007.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7

2019-04-30 Thread Shurong Mai (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830057#comment-16830057
 ] 

Shurong Mai commented on YARN-9518:
---

Hi, [~Jim_Brennan] ,  does "latest code (trunk)" mean the latest version, for 
example hadoop-2.9.2, hadoop-3.2.0  ?

> can't use CGroups with YARN in centos7 
> ---
>
> Key: YARN-9518
> URL: https://issues.apache.org/jira/browse/YARN-9518
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2
>Reporter: Shurong Mai
>Priority: Major
>  Labels: cgroup, patch
> Attachments: YARN-9518.patch
>
>
> The os version is centos7.
>  
> When I had set configuration variables  for cgroup with yarn, nodemanager 
> could be start without any matter. But when I ran a job, the job failed with 
> these exceptional nodemanager logs in the end.
> In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> After I analysed, I found the reason. In centos6, the cgroup "cpu" and 
> "cpuacct" subsystem are as follows: 
> {code:java}
> /sys/fs/cgroup/cpu
> /sys/fs/cgroup/cpuacct
> {code}
> But in centos7, as follows:
> {code:java}
> /sys/fs/cgroup/cpu -> cpu,cpuacct
> /sys/fs/cgroup/cpuacct -> cpu,cpuacct
> /sys/fs/cgroup/cpu,cpuacct{code}
> "cpu" and "cpuacct" have merge as "cpu,cpuacct".  "cpu"  and  "cpuacct"  are 
> symbol links. 
> As I look at source code, nodemamager get the cgroup subsystem info by 
> reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also 
> "/sys/fs/cgroup/cpu,cpuacct". 
> The resource description arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> There is a comma in the cgroup path, but the comma is separator of multi 
> resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" 
> rather than correct cgroup path " 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks
>  " and report the error in the log  " Can't open file /sys/fs/cgroup/cpu as 
> node manager - Is a directory "
> Hence I modify the source code and submit a patch. The idea of patch is that 
> nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than 
> "/sys/fs/cgroup/cpu,cpuacct". As a result, the  resource description 
> arguments of container-executor is such as follows: 
> {code:java}
> cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks
> {code}
> Note that there is no comma in the path, and is a valid path because 
> "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". 
> After applied the patch, the problem is resolved and the job can run 
> successfully.
> The patch is universally applicable to cgroup subsystem paths, such as cgroup 
> network subsystem as follows:  
> {code:java}
> /sys/fs/cgroup/net_cls -> net_cls,net_prio
> /sys/fs/cgroup/net_prio -> net_cls,net_prio
> /sys/fs/cgroup/net_cls,net_prio{code}
>  
>  
> ##
> {panel:title=exceptional nodemanager logs:}
> 2019-04-19 20:17:20,095 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1554210318404_0042_01_01 transitioned from LOCALIZED 
> to RUNNING
>  2019-04-19 20:17:20,101 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_1554210318404_0042_01_01 is : 27
>  2019-04-19 20:17:20,103 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception 
> from container-launch with container ID: container_155421031840
>  4_0042_01_01 and exit code: 27
>  ExitCodeException exitCode=27:
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
>  at org.apache.hadoop.util.Shell.run(Shell.java:482)
>  at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread

[jira] [Commented] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config

2019-04-30 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830051#comment-16830051
 ] 

Szilard Nemeth commented on YARN-9519:
--

Hi [~adam.antal]!
Thanks for this patch!
+1 (non-binding)

> TFile log aggregation file format is insensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config
> 
>
> Key: YARN-9519
> URL: https://issues.apache.org/jira/browse/YARN-9519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9519.001.patch
>
>
> The TFile log aggregation file format is not sensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config.
> In {{LogAggregationTFileController$initInternal}}:
> {code:java}
> this.remoteRootLogDir = new Path(
> conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
> YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
> {code}
> So the remoteRootLogDir is only aware of the 
> yarn.nodemanager.remote-app-log-dir config, while other file format, like 
> IFile defaults to the file format config, so its priority is higher.
> From {{LogAggregationIndexedFileController$initInternal}}:
> {code:java}
> String remoteDirStr = String.format(
> YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
> this.fileControllerName);
> String remoteDir = conf.get(remoteDirStr);
> if (remoteDir == null || remoteDir.isEmpty()) {
>   remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
>   YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
> }
> {code}
> (Where these configs are: )
> {code:java}
> public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
>   = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
> public static final String NM_REMOTE_APP_LOG_DIR = 
> NM_PREFIX + "remote-app-log-dir";
> {code}
> I suggest TFile should try to obtain the remote dir config from 
> yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
> specified falls back to the yarn.nodemanager.remote-app-log-dir config.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config

2019-04-30 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9519:
-
Description: 
The TFile log aggregation file format is not sensitive to the 
yarn.log-aggregation.TFile.remote-app-log-dir config.

In {{LogAggregationTFileController$initInternal}}:
{code:java}
this.remoteRootLogDir = new Path(
conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
{code}
So the remoteRootLogDir is only aware of the 
yarn.nodemanager.remote-app-log-dir config, while other file format, like IFile 
defaults to the file format config, so its priority is higher.

>From {{LogAggregationIndexedFileController$initInternal}}:
{code:java}
String remoteDirStr = String.format(
YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
this.fileControllerName);
String remoteDir = conf.get(remoteDirStr);
if (remoteDir == null || remoteDir.isEmpty()) {
  remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
  YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
}
{code}
(Where these configs are: )
{code:java}
public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
  = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
public static final String NM_REMOTE_APP_LOG_DIR = 
NM_PREFIX + "remote-app-log-dir";
{code}
I suggest TFile should try to obtain the remote dir config from 
yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
specified falls back to the yarn.nodemanager.remote-app-log-dir config.

  was:
The TFile log aggregation file format is not sensitive to the 
yarn.log-aggregation.TFile.remote-app-log-dir config.

In {{LogAggregationTFileController$initInternal}}:
{code:java}
this.remoteRootLogDir = new Path(
conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
{code}
So the remoteRootLogDir is only aware of the 
yarn.nodemanager.remote-app-log-dir config, while other file format, like IFile 
defaults to the file format config, so its priority is bigger.

>From {{LogAggregationIndexedFileController$initInternal}}:
{code:java}
String remoteDirStr = String.format(
YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
this.fileControllerName);
String remoteDir = conf.get(remoteDirStr);
if (remoteDir == null || remoteDir.isEmpty()) {
  remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
  YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
}
{code}
(Where these configs are: )
{code:java}
public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
  = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
public static final String NM_REMOTE_APP_LOG_DIR = 
NM_PREFIX + "remote-app-log-dir";
{code}
I suggest TFile should try to obtain the remote dir config from 
yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not 
specified falls back to the yarn.nodemanager.remote-app-log-dir config.


> TFile log aggregation file format is insensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config
> 
>
> Key: YARN-9519
> URL: https://issues.apache.org/jira/browse/YARN-9519
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9519.001.patch
>
>
> The TFile log aggregation file format is not sensitive to the 
> yarn.log-aggregation.TFile.remote-app-log-dir config.
> In {{LogAggregationTFileController$initInternal}}:
> {code:java}
> this.remoteRootLogDir = new Path(
> conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
> YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR));
> {code}
> So the remoteRootLogDir is only aware of the 
> yarn.nodemanager.remote-app-log-dir config, while other file format, like 
> IFile defaults to the file format config, so its priority is higher.
> From {{LogAggregationIndexedFileController$initInternal}}:
> {code:java}
> String remoteDirStr = String.format(
> YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT,
> this.fileControllerName);
> String remoteDir = conf.get(remoteDirStr);
> if (remoteDir == null || remoteDir.isEmpty()) {
>   remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR,
>   YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR);
> }
> {code}
> (Where these configs are: )
> {code:java}
> public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT
>   = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir";
> public static f

44 matches

Mail list logo