[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830859#comment-16830859 ] Hadoop QA commented on YARN-9440: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 33s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 34m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 6m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 31m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 9m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 7s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 40s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 5 new + 189 unchanged - 19 fixed = 194 total (was 208) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 1s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 0s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}217m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9440 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967542/YARN-9440.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 26fbf1efdd6b 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personal
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830815#comment-16830815 ] Prabhu Joseph commented on YARN-6929: - Thanks [~eyang]. > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-6929-007.patch, YARN-6929-008.patch, > YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, > YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, > YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code:java} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) >
[jira] [Updated] (YARN-9507) Fix NPE in NodeManager#serviceStop on startup failure
[ https://issues.apache.org/jira/browse/YARN-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-9507: --- Summary: Fix NPE in NodeManager#serviceStop on startup failure (was: Fix NPE if NM fails to init) > Fix NPE in NodeManager#serviceStop on startup failure > - > > Key: YARN-9507 > URL: https://issues.apache.org/jira/browse/YARN-9507 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-9507-001.patch > > > 2019-04-24 14:06:44,101 WARN org.apache.hadoop.service.AbstractService: When > stopping the service NodeManager > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:492) > at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:947) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1018) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9440: --- Attachment: YARN-9440.003.patch > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9440.001.patch, YARN-9440.002.patch, > YARN-9440.003.patch > > > [Design doc > #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9440: --- Attachment: (was: YARN-9440.003.patch) > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9440.001.patch, YARN-9440.002.patch, > YARN-9440.003.patch > > > [Design doc > #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830769#comment-16830769 ] Yufei Gu commented on YARN-9520: * inter-queue preemption will not happen among the applications of inside the queue. Yes. * FIFO ordering policy the newer applications will preempted first if the priority is same or not set. In other words, the older applications will considered for preemption only after the newer applications are preempted. No. Only the oldest one has less chance to be preempted. All others have the same chance. * multiple applications of a queue will run if resources are available. lets say there are resources for 200 containers, 2 applications of 100 containers will run. after 50 containers of each finished does the 3rd containers will get allocated? or it will wait for first 2 applications will finish? Yes. The 3rd one can run. > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830714#comment-16830714 ] Hudson commented on YARN-6929: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16482 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16482/]) YARN-6929. Improved partition algorithm for yarn remote-app-log-dir. (eyang: rev accb811e5727f2a780a41cd5e50bab47a0cccb68) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestContainerLogsUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileControllerFactory.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogAggregationUtils.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/LogAggregationIndexedFileController.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogDeletionService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogDeletionService.java > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-6929-007.patch, YARN-6929-008.patch, > YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, > YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, > YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code:java} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830687#comment-16830687 ] Eric Yang commented on YARN-6929: - +1 Patch 11 looks good to me. > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-6929-007.patch, YARN-6929-008.patch, > YARN-6929-009.patch, YARN-6929-010.patch, YARN-6929-011.patch, > YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, > YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code:java} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop
[jira] [Comment Edited] (YARN-9523) Build application catalog docker image as part of hadoop dist build
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830658#comment-16830658 ] Eric Yang edited comment on YARN-9523 at 4/30/19 8:21 PM: -- [~jeagles], [~ebadger] you are welcome to chime in to HDDS-1458. I have intention to add --privileged as a option to start-build-env.sh and add privileged requirement for integration test with profile activation. You are welcome to add your feedback to ensure that those patches are done with community consensus. {quote}If there is a relevant discussion thread, please reference it here.{quote} The plan is in HADOOP-16091, and put on hold until we process the outcome of previous iteration. Conversation are in Cloudera internal slack and web meetings. The idea is only moving forward after we think through our own thought process and current obstacles. was (Author: eyang): [~jeagles], [~ebadger] you are welcome to chime in to HDDS-1458. I have intention to add --privileged as a option to start-build-env.sh and add privileged requirement for integration test with profile activation. You are welcome to add your feedback to ensure that those patches are done with community consensus. > Build application catalog docker image as part of hadoop dist build > --- > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9523) Build application catalog docker image as part of hadoop dist build
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9523: Summary: Build application catalog docker image as part of hadoop dist build (was: Rename docker profile to dist profile) > Build application catalog docker image as part of hadoop dist build > --- > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9523) Rename docker profile to dist profile
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830658#comment-16830658 ] Eric Yang commented on YARN-9523: - [~jeagles], [~ebadger] you are welcome to chime in to HDDS-1458. I have intention to add --privileged as a option to start-build-env.sh and add privileged requirement for integration test with profile activation. You are welcome to add your feedback to ensure that those patches are done with community consensus. > Rename docker profile to dist profile > - > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9523) Rename docker profile to dist profile
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830586#comment-16830586 ] Jonathan Eagles commented on YARN-9523: --- bq. please work on changing the JIRA summary to indicate this is not a simple renaming, but a large change in the build process [~eyang], please address the JIRA title. It is very misleading what this jira is about. bq. I am working on Ozone, and the same discussion has surfaced in Ozone community. The suggestion came up to use dist profile to build docker image. If there is a relevant discussion thread, please reference it here. > Rename docker profile to dist profile > - > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9523) Rename docker profile to dist profile
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-9523: --- Assignee: Eric Yang > Rename docker profile to dist profile > - > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9523) Rename docker profile to dist profile
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830578#comment-16830578 ] Eric Yang commented on YARN-9523: - [~jeagles] With all due respect, this issue will not be committed unless there is consensus. I am working on Ozone, and the same discussion has surfaced in Ozone community. The suggestion came up to use dist profile to build docker image. It seems like a reasonable change to ensure that Hadoop community can build distro that includes docker images. I like to bring this up to bi-weekly YARN docker meeting for discussion again because multiple subprojects faces similar problem. It's best to solve this problem in consistent manner instead of having Ozone project build in kubernetes, HDFS in docker-compose, and YARN in docker. Without consistency, it impacts techniques that are used for integration tests. It would be best to talk about this option upfront. > Rename docker profile to dist profile > - > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9523) Rename docker profile to dist profile
[ https://issues.apache.org/jira/browse/YARN-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830545#comment-16830545 ] Jonathan Eagles commented on YARN-9523: --- Pointing out the highly relevant community discussion that took place that illustrates at that time, a majority not in favor of enabling docker image creation by default. https://lists.apache.org/thread.html/c63f404bc44f8f249cbc98ee3f6633384900d07e2308008fe4620150@%3Ccommon-dev.hadoop.apache.org%3E [~eyang] please work on changing the JIRA summary to indicate this is not a simple renaming, but a large change in the build process. By having a misleading JIRA summary, it may prevent community discussion and fail to gain the notice of interested community stakeholders. > Rename docker profile to dist profile > - > > Key: YARN-9523 > URL: https://issues.apache.org/jira/browse/YARN-9523 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Priority: Major > > It would be nice to make Application catalog docker image as part of the > distribution. The suggestion is to change from: > {code:java} > mvn clean package -Pnative,dist,docker{code} > to > {code:java} > mvn clean package -Pnative,dist{code} > User can still build tarball only using: > {code:java} > mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9523) Rename docker profile to dist profile
Eric Yang created YARN-9523: --- Summary: Rename docker profile to dist profile Key: YARN-9523 URL: https://issues.apache.org/jira/browse/YARN-9523 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Yang It would be nice to make Application catalog docker image as part of the distribution. The suggestion is to change from: {code:java} mvn clean package -Pnative,dist,docker{code} to {code:java} mvn clean package -Pnative,dist{code} User can still build tarball only using: {code:java} mvn clean package -DskipDocker -DskipTests -DskipShade -Pnative,dist{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830417#comment-16830417 ] Hadoop QA commented on YARN-9440: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 54s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 3s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 5 new + 189 unchanged - 19 fixed = 194 total (was 208) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 42s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 47s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}155m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9440 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967487/YARN-9440.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c5fb9b424c3b 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dead9b4 | | maven | version:
[jira] [Comment Edited] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830398#comment-16830398 ] Sudhir Babu Pothineni edited comment on YARN-9520 at 4/30/19 3:29 PM: -- Thanks [~yufeigu] didn't thought about FIFO policy, will try it. could you please clarify if following with FIFO policy preemption are true? * inter-queue preemption will not happen among the applications of inside the queue. * FIFO ordering policy the newer applications will preempted first if the priority is same or not set. In other words, the older applications will considered for preemption only after the newer applications are preempted. * multiple applications of a queue will run if resources are available. lets say there are resources for 200 containers, 2 applications of 100 containers will run. after 50 containers of each finished does the 3rd containers will get allocated? or it will wait for first 2 applications will finish? was (Author: sbpothineni): Thanks [~yufeigu] didn't thought about FIFO policy, will try it. could you please clarify if following with FIFO preemption are true? * inter-queue preemption will not happen among the applications of inside the queue. * FIFO ordering policy the newer applications will preempted first if the priority is same or not set. In other words, the older applications will considered for preemption only after the newer applications are preempted. * multiple applications of a queue will run if resources are available. lets say there are resources for 200 containers, 2 applications of 100 containers will run. after 50 containers of each finished does the 3rd containers will get allocated? or it will wait for first 2 applications will finish? > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830398#comment-16830398 ] Sudhir Babu Pothineni commented on YARN-9520: - Thanks [~yufeigu] didn't thought about FIFO policy, will try it. could you please clarify if following with FIFO preemption are true? * inter-queue preemption will not happen among the applications of inside the queue. * FIFO ordering policy the newer applications will preempted first if the priority is same or not set. In other words, the older applications will considered for preemption only after the newer applications are preempted. * multiple applications of a queue will run if resources are available. lets say there are resources for 200 containers, 2 applications of 100 containers will run. after 50 containers of each finished does the 3rd containers will get allocated? or it will wait for first 2 applications will finish? > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830347#comment-16830347 ] Jim Brennan commented on YARN-9518: --- [~shurong.mai], your patch needs to be based on branch trunk. I tried applying your branch to my local version of trunk, and it does not apply. See [https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute] Also see the patch naming convention - it needs to be something like: YARN-9518.001.patch to be picked up by the automated tests. I was not suggesting that this issue was fixed by YARN-5301 - there have been a few other changes since then. It just looks like your current patch is based on code from before YARN-5301. > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at >
[jira] [Commented] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830306#comment-16830306 ] Szilard Nemeth commented on YARN-9477: -- Thanks [~pbacsko] for the quick update! The latest patch of this POC looks good to me, I think you can move forward with testing. > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch, > YARN-9477-POC3.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830297#comment-16830297 ] Peter Bacsko commented on YARN-9477: Thanks [~snemeth] for the comments. _In VEDeviceDiscoverer#getDeviceState: Please use a more descriptive and detailed error message rather than "Unknown "_ That string was taken from the Python script given to us by NEC and will not be printed as an error message, so I think it's good. _Could you please add some testcases for VEDeviceDiscoverer?_ Yes, once this POC is considered to be good, I'll upload them as non-POC. > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch, > YARN-9477-POC3.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9477: --- Attachment: YARN-9477-POC3.patch > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch, > YARN-9477-POC3.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830272#comment-16830272 ] Szilard Nemeth commented on YARN-9477: -- Hi [~pbacsko]! Some comments: 1. In UdevUtil#getSysPath: the variable sysPath is redundant, you can return it immediately without declaring it. 2. Similarly in VEDeviceDiscoverer#getDevicesFromPath, you can return the value at the end of the method immediately. 3. After the closing curly bracket of UdevUtil.LibUdev#init, there's an additional semicolon, please remove it! 4. In VEDeviceDiscoverer#getDeviceState: Please use a more descriptive and detailed error message rather than "Unknown ". 5. In VEDeviceDiscoverer: You use the String "ONLINE" both in the DEVICE_STATE array and on its own in toDevice method. Please define a static final String constant for this one, at least. I think the rest of the string values in DEVICE_STATE can remain intact. 6. Could you please add some testcases for VEDeviceDiscoverer? I think you should use a temporary directory for test files and you should test if the devices are parsed correctly (toDevice method). I can take another look once you fixed these above. Thanks! > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9440: --- Attachment: YARN-9440.003.patch > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9440.001.patch, YARN-9440.002.patch, > YARN-9440.003.patch > > > [Design doc > #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830264#comment-16830264 ] Tao Yang commented on YARN-9440: Thanks [~cheersyang] for some advices when we talked on the phone. Attached v3 patch to refactor DiagnosticsCollector interface and replace its implements(ResourceDiagnosticsCollector/PlacementConstraintsDiagnosticsCollector) with GenericDiagnosticsCollector. Move collection logic of resource diagnostics outside of the ResourceCalculator to not affect basic classes but keep passing optional collector into the calling stack of PC verification to avoid performance regression via collecting diagnostics only when necessary (activities enabled for current node/app and PC/partition mismatched). > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9440.001.patch, YARN-9440.002.patch > > > [Design doc > #4.1|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9477: --- Attachment: YARN-9477-POC2.patch > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-POC.patch, YARN-9477-POC2.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. {code:java} cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) {code} When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated by container-executor as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is universally applicable to cgroup subsystem paths, such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} ## {panel:title=exceptional nodemanager logs:} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.She
[jira] [Commented] (YARN-9521) RM filed to start due to system services
[ https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830174#comment-16830174 ] Hadoop QA commented on YARN-9521: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 14s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9521 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967474/YARN-9521.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 55cd210c9e78 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7fbaa7d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24032/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-api.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24032/te
[jira] [Updated] (YARN-9522) AppBlock ignores full qualified class name of PseudoAuthenticationHandler
[ https://issues.apache.org/jira/browse/YARN-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9522: Affects Version/s: 3.2.0 > AppBlock ignores full qualified class name of PseudoAuthenticationHandler > - > > Key: YARN-9522 > URL: https://issues.apache.org/jira/browse/YARN-9522 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > > {{AuthenticationHandler}} can be either configured using fqcn or type. > {{AppBlock}} checks for only the type simple and ignores the fqcn of > {{PseudoAuthenticationHandler}} while checking whether ui is secured or not. > {code} >* @param authHandler The short-name (or fully qualified class name) of the >* authentication handler. > {code} > *AppBlock.java* > {code} > // check if UI is unsecured. > String httpAuth = > conf.get(CommonConfigurationKeys.HADOOP_HTTP_AUTHENTICATION_TYPE); > this.unsecuredUI = (httpAuth != null) && httpAuth.equals("simple"); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9522) AppBlock ignores full qualified class name of PseudoAuthenticationHandler
Prabhu Joseph created YARN-9522: --- Summary: AppBlock ignores full qualified class name of PseudoAuthenticationHandler Key: YARN-9522 URL: https://issues.apache.org/jira/browse/YARN-9522 Project: Hadoop YARN Issue Type: Bug Reporter: Prabhu Joseph Assignee: Prabhu Joseph {{AuthenticationHandler}} can be either configured using fqcn or type. {{AppBlock}} checks for only the type simple and ignores the fqcn of {{PseudoAuthenticationHandler}} while checking whether ui is secured or not. {code} * @param authHandler The short-name (or fully qualified class name) of the * authentication handler. {code} *AppBlock.java* {code} // check if UI is unsecured. String httpAuth = conf.get(CommonConfigurationKeys.HADOOP_HTTP_AUTHENTICATION_TYPE); this.unsecuredUI = (httpAuth != null) && httpAuth.equals("simple"); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830140#comment-16830140 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], I have read YARN-5301 and the patch, I don't think it is the same problem. YARN-5301 is about -mount-cgroups fail if enable auto mount cgroup, while this issue is about resource description arguments of container-executor which cause the cgroup path truncated because of comma in path "/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks". Therefore, this issue is another problem which is different from YARN-5301. > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated by container-executor as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is universally applicable to cgroup subsystem paths, such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} ## {panel:title=exceptional nodemanager logs:} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoo
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830132#comment-16830132 ] Hadoop QA commented on YARN-9518: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} YARN-9518 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9518 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967465/YARN-9518.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24031/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776
[jira] [Updated] (YARN-9521) RM filed to start due to system services
[ https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kyungwan nam updated YARN-9521: --- Attachment: YARN-9521.001.patch > RM filed to start due to system services > > > Key: YARN-9521 > URL: https://issues.apache.org/jira/browse/YARN-9521 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-9521.001.patch > > > when starting RM, listing system services directory has failed as follows. > {code} > 2019-04-30 17:18:25,441 INFO client.SystemServiceManagerImpl > (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory > is configured to /services > 2019-04-30 17:18:25,467 INFO client.SystemServiceManagerImpl > (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation > initialized to yarn (auth:SIMPLE) > 2019-04-30 17:18:25,467 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in > state STARTED > org.apache.hadoop.service.ServiceStateException: java.io.IOException: > Filesystem closed > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501) > Caused by: java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200) > at > org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179) > at > org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > ... 13 more > {code} > it looks like due to the usage of filesystem cache. > this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to > yarn-site -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9521) RM filed to start due to system services
kyungwan nam created YARN-9521: -- Summary: RM filed to start due to system services Key: YARN-9521 URL: https://issues.apache.org/jira/browse/YARN-9521 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.2 Reporter: kyungwan nam when starting RM, listing system services directory has failed as follows. {code} 2019-04-30 17:18:25,441 INFO client.SystemServiceManagerImpl (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory is configured to /services 2019-04-30 17:18:25,467 INFO client.SystemServiceManagerImpl (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation initialized to yarn (auth:SIMPLE) 2019-04-30 17:18:25,467 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in state STARTED org.apache.hadoop.service.ServiceStateException: java.io.IOException: Filesystem closed at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179) at org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187) at org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375) at org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282) at org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) ... 13 more {code} it looks like due to the usage of filesystem cache. this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to yarn-site -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830103#comment-16830103 ] Shurong Mai edited comment on YARN-9518 at 4/30/19 9:44 AM: [~Jim_Brennan], I have read the source code about these in version 2.7.7, 2.8.5, 2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma as "/sys/fs/cgroup/cpu,cpuacct". was (Author: shurong.mai): [~Jim_Brennan], I have red the source code about these in version 2.7.7, 2.8.5, 2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma as "/sys/fs/cgroup/cpu,cpuacct". > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830103#comment-16830103 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], I have red the source code about these in version 2.7.7, 2.8.5, 2.9.2, 3.2.0. There are same problem with cgroup CPU subsystem path with comma as "/sys/fs/cgroup/cpu,cpuacct". > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolEx
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: YARN-9518.patch > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > contain
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: (was: YARN-9518.patch) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2019-04-19 20:17:20,108 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2019-04-19 20:17
[jira] [Updated] (YARN-9475) Create basic VE plugin
[ https://issues.apache.org/jira/browse/YARN-9475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9475: --- Fix Version/s: 3.3.0 > Create basic VE plugin > -- > > Key: YARN-9475 > URL: https://issues.apache.org/jira/browse/YARN-9475 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: 3.3.0 > Fix For: 3.3.0 > > Attachments: YARN-9475-001.patch, YARN-9475-002.patch, > YARN-9475-003.patch, YARN-9475-004.patch, YARN-9475-005.patch, > YARN-9475-006.patch, YARN-9475-007.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830057#comment-16830057 ] Shurong Mai commented on YARN-9518: --- Hi, [~Jim_Brennan] , does "latest code (trunk)" mean the latest version, for example hadoop-2.9.2, hadoop-3.2.0 ? > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread
[jira] [Commented] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
[ https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830051#comment-16830051 ] Szilard Nemeth commented on YARN-9519: -- Hi [~adam.antal]! Thanks for this patch! +1 (non-binding) > TFile log aggregation file format is insensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config > > > Key: YARN-9519 > URL: https://issues.apache.org/jira/browse/YARN-9519 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9519.001.patch > > > The TFile log aggregation file format is not sensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config. > In {{LogAggregationTFileController$initInternal}}: > {code:java} > this.remoteRootLogDir = new Path( > conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); > {code} > So the remoteRootLogDir is only aware of the > yarn.nodemanager.remote-app-log-dir config, while other file format, like > IFile defaults to the file format config, so its priority is higher. > From {{LogAggregationIndexedFileController$initInternal}}: > {code:java} > String remoteDirStr = String.format( > YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, > this.fileControllerName); > String remoteDir = conf.get(remoteDirStr); > if (remoteDir == null || remoteDir.isEmpty()) { > remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); > } > {code} > (Where these configs are: ) > {code:java} > public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT > = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; > public static final String NM_REMOTE_APP_LOG_DIR = > NM_PREFIX + "remote-app-log-dir"; > {code} > I suggest TFile should try to obtain the remote dir config from > yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not > specified falls back to the yarn.nodemanager.remote-app-log-dir config. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
[ https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9519: - Description: The TFile log aggregation file format is not sensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config. In {{LogAggregationTFileController$initInternal}}: {code:java} this.remoteRootLogDir = new Path( conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); {code} So the remoteRootLogDir is only aware of the yarn.nodemanager.remote-app-log-dir config, while other file format, like IFile defaults to the file format config, so its priority is higher. >From {{LogAggregationIndexedFileController$initInternal}}: {code:java} String remoteDirStr = String.format( YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, this.fileControllerName); String remoteDir = conf.get(remoteDirStr); if (remoteDir == null || remoteDir.isEmpty()) { remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); } {code} (Where these configs are: ) {code:java} public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; public static final String NM_REMOTE_APP_LOG_DIR = NM_PREFIX + "remote-app-log-dir"; {code} I suggest TFile should try to obtain the remote dir config from yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not specified falls back to the yarn.nodemanager.remote-app-log-dir config. was: The TFile log aggregation file format is not sensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config. In {{LogAggregationTFileController$initInternal}}: {code:java} this.remoteRootLogDir = new Path( conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); {code} So the remoteRootLogDir is only aware of the yarn.nodemanager.remote-app-log-dir config, while other file format, like IFile defaults to the file format config, so its priority is bigger. >From {{LogAggregationIndexedFileController$initInternal}}: {code:java} String remoteDirStr = String.format( YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, this.fileControllerName); String remoteDir = conf.get(remoteDirStr); if (remoteDir == null || remoteDir.isEmpty()) { remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); } {code} (Where these configs are: ) {code:java} public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; public static final String NM_REMOTE_APP_LOG_DIR = NM_PREFIX + "remote-app-log-dir"; {code} I suggest TFile should try to obtain the remote dir config from yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not specified falls back to the yarn.nodemanager.remote-app-log-dir config. > TFile log aggregation file format is insensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config > > > Key: YARN-9519 > URL: https://issues.apache.org/jira/browse/YARN-9519 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9519.001.patch > > > The TFile log aggregation file format is not sensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config. > In {{LogAggregationTFileController$initInternal}}: > {code:java} > this.remoteRootLogDir = new Path( > conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); > {code} > So the remoteRootLogDir is only aware of the > yarn.nodemanager.remote-app-log-dir config, while other file format, like > IFile defaults to the file format config, so its priority is higher. > From {{LogAggregationIndexedFileController$initInternal}}: > {code:java} > String remoteDirStr = String.format( > YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, > this.fileControllerName); > String remoteDir = conf.get(remoteDirStr); > if (remoteDir == null || remoteDir.isEmpty()) { > remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); > } > {code} > (Where these configs are: ) > {code:java} > public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT > = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; > public static f