[jira] [Commented] (YARN-9494) ApplicationHistoryServer endpoint access wrongly requested
[ https://issues.apache.org/jira/browse/YARN-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833408#comment-16833408 ] Hadoop QA commented on YARN-9494: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 2s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 38s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 27m 21s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 50s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}193m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.client.cli.TestLogsCLI | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9494 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967871/YARN-9494.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 978d47ce542a 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64
[jira] [Commented] (YARN-9514) RMAppBlock enable kill button only when user has rights
[ https://issues.apache.org/jira/browse/YARN-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833373#comment-16833373 ] Hadoop QA commented on YARN-9514: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 1 new + 20 unchanged - 2 fixed = 21 total (was 22) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 46s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 19s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 20s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9514 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967865/YARN-9514-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fcaab00c7f49 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build
[jira] [Updated] (YARN-9494) ApplicationHistoryServer endpoint access wrongly requested
[ https://issues.apache.org/jira/browse/YARN-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Xinglong updated YARN-9494: - Attachment: YARN-9494.002.patch > ApplicationHistoryServer endpoint access wrongly requested > -- > > Key: YARN-9494 > URL: https://issues.apache.org/jira/browse/YARN-9494 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Reporter: Wang, Xinglong >Priority: Minor > Attachments: YARN-9494.001.patch, YARN-9494.002.patch > > > With the following configuration, resource manager will redirect > https://resourcemanager.hadoop.com:50030/proxy/application_1553677175329_47053/ > to 0.0.0.0:10200 when resource manager can't find > application_1553677175329_47053 in applicationManager. > {code:java} > yarn.timeline-service.enabled = false > yarn.timeline-service.generic-application-history.enabled = true > {code} > However, in this case, there is no timeline service enabled, thus no > yarn.timeline-service.address defined, and 0.0.0.0:10200 will be used as > timelineserver access point. > This combination of configuration is a valid configuration, due to we have in > house tool to analyze the generic-applicaiton-history files generated by > resource manager. While we don't enable timeline service. > {code:java} > HTTP ERROR 500 > Problem accessing /proxy/application_1553677175329_47053/. Reason: > Call From x/10.22.59.23 to 0.0.0.0:10200 failed on connection > exception: java.net.ConnectException: Connection refused; For more details > see: http://wiki.apache.org/hadoop/ConnectionRefused > Caused by: > java.net.ConnectException: Call From x/10.22.59.23 to 0.0.0.0:10200 > failed on connection exception: java.net.ConnectException: Connection > refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.GeneratedConstructorAccessor240.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558) > at org.apache.hadoop.ipc.Client.call(Client.java:1498) > at org.apache.hadoop.ipc.Client.call(Client.java:1398) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getApplicationReport(ApplicationHistoryProtocolPBClientImpl.java:108) > at > org.apache.hadoop.yarn.server.webproxy.AppReportFetcher.getApplicationReport(AppReportFetcher.java:137) > at > org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.getApplicationReport(WebAppProxyServlet.java:251) > at > org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.getFetchedAppReport(WebAppProxyServlet.java:491) > at > org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:329) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:66) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) > at >
[jira] [Updated] (YARN-9514) RMAppBlock enable kill button only when user has rights
[ https://issues.apache.org/jira/browse/YARN-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bilwa S T updated YARN-9514: Attachment: YARN-9514-002.patch > RMAppBlock enable kill button only when user has rights > > > Key: YARN-9514 > URL: https://issues.apache.org/jira/browse/YARN-9514 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-9514-001.patch, YARN-9514-002.patch > > > Listing of application are based on the following > * admin.acl > * Application acl (view & modify) > For user's who doesnt have modify/admin rights, kill button for application > is not required to be shown. Unauthorized error is returned when invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9514) RMAppBlock enable kill button only when user has rights
[ https://issues.apache.org/jira/browse/YARN-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833313#comment-16833313 ] Bilwa S T commented on YARN-9514: - Thanks [~bibinchundatt] for reviewing {quote}I dont this the fix is correct. ApplicationACLManager init will add current user to list of accessible users. {quote} AdminACLsManager will add current user which is configured user for key "yarn.resourcemanager.principal" . So killApplication button is not shown if user doesn't have modify rights and is not an admin and if user is not principaluser according to fix. {quote}Kill Button is applicable only for RMAppBlock so the handling could be done in RMAppBlock. Implementation could be similar to createApplicationMetricsTable thoughts?? {quote} I think handling it in RMAppBlock is correct. > RMAppBlock enable kill button only when user has rights > > > Key: YARN-9514 > URL: https://issues.apache.org/jira/browse/YARN-9514 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-9514-001.patch > > > Listing of application are based on the following > * admin.acl > * Application acl (view & modify) > For user's who doesnt have modify/admin rights, kill button for application > is not required to be shown. Unauthorized error is returned when invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833267#comment-16833267 ] Hadoop QA commented on YARN-9518: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 3s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 70m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9518 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967853/YARN-9518-trunk.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 186a30d3364b 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d331a2a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24052/testReport/ | | Max. process+thread count | 415 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24052/console | | Powered by | Apache Yetus 0.8.0
[jira] [Comment Edited] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833259#comment-16833259 ] NedaMaleki edited comment on YARN-1021 at 5/5/19 8:29 AM: -- *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as :* /usr/local/hadoop/share/hadoop/tools/sls/bin/slsrun.sh --input-rumen=/usr/local/hadoop/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json --output-dir=/usr/local/hadoop/share/hadoop/tools/sls/output 19/05/05 11:52:31 INFO conf.Configuration: found resource core-site.xml at [file:/usr/local/hadoop/etc/hadoop/core-site.xml|file:///usr/local/hadoop/etc/hadoop/core-site.xml] 19/05/05 11:52:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/05/05 11:52:31 INFO security.Groups: clearing userToGroupsMap cache 19/05/05 11:52:31 INFO conf.Configuration: found resource yarn-site.xml at [file:/usr/local/hadoop/etc/hadoop/yarn-site.xml|file:///usr/local/hadoop/etc/hadoop/yarn-site.xml] 19/05/05 11:52:31 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher 19/05/05 11:52:32 INFO security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 8640ms and NMTokenKeyActivationDelay: 90ms 19/05/05 11:52:32 INFO security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 8640ms and ContainerTokenKeyActivationDelay: 90ms 19/05/05 11:52:32 INFO security.AMRMTokenSecretManager: Rolling master-key for amrm-tokens 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager 19/05/05 11:52:32 INFO resourcemanager.ResourceManager: Using Scheduler: {color:#ff}*org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper*{color} {color:#ff}*java.lang.NullPointerException*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.web.SLSWebApp.(SLSWebApp.java:82)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:465)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:164)*{color} {color:#ff} *at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)*{color} {color:#ff} *at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)*{color} {color:#ff} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:261)*{color} {color:#ff} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:403)*{color} {color:#ff} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#ff} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:824)*{color} {color:#ff} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:226)*{color} {color:#ff} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)*{color} 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class
[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832991#comment-16832991 ] Shurong Mai edited comment on YARN-9517 at 5/5/19 8:14 AM: --- Hi, [~wangda] , I just thought the problem was resolved by the patch, so I closed this Jira issue. It is not fixed in these branches. I have reopened this issue. was (Author: shurong.mai): Hi, [~wangda] , I just thought the problem was resolved by the patch, so I closed this Jira issue. I haven't commit the patch to these branches. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 2.8.5, 2.7.7 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517-branch-2.8.5.001.patch, YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833259#comment-16833259 ] NedaMaleki edited comment on YARN-1021 at 5/5/19 8:12 AM: -- *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as :* /usr/local/hadoop/share/hadoop/tools/sls/bin/slsrun.sh --input-rumen=/usr/local/hadoop/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json --output-dir=/usr/local/hadoop/share/hadoop/tools/sls/output 19/05/05 11:52:31 INFO conf.Configuration: found resource core-site.xml at file:/usr/local/hadoop/etc/hadoop/core-site.xml 19/05/05 11:52:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/05/05 11:52:31 INFO security.Groups: clearing userToGroupsMap cache 19/05/05 11:52:31 INFO conf.Configuration: found resource yarn-site.xml at file:/usr/local/hadoop/etc/hadoop/yarn-site.xml 19/05/05 11:52:31 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher 19/05/05 11:52:32 INFO security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 8640ms and NMTokenKeyActivationDelay: 90ms 19/05/05 11:52:32 INFO security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 8640ms and ContainerTokenKeyActivationDelay: 90ms 19/05/05 11:52:32 INFO security.AMRMTokenSecretManager: Rolling master-key for amrm-tokens 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager 19/05/05 11:52:32 INFO resourcemanager.ResourceManager: Using Scheduler: {color:#ff}*org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper*{color} {color:#ff}*java.lang.NullPointerException*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.web.SLSWebApp.(SLSWebApp.java:82)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:465)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:164)*{color} {color:#ff} *at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)*{color} {color:#ff} *at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)*{color} {color:#ff} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:261)*{color} {color:#ff} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:403)*{color} {color:#ff} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#ff} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:824)*{color} {color:#ff} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:226)*{color} {color:#ff} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137)*{color} {color:#ff} *at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)*{color} 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher 19/05/05 11:52:32 INFO
[jira] [Comment Edited] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833259#comment-16833259 ] NedaMaleki edited comment on YARN-1021 at 5/5/19 8:10 AM: -- *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as :* RMTokenSecretManager: Rolling master-key for amrm-tokens 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager 19/05/05 11:52:32 INFO resourcemanager.ResourceManager: Using Scheduler: {color:#FF}*org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper*{color} {color:#FF}*java.lang.NullPointerException*{color} {color:#FF} *at org.apache.hadoop.yarn.sls.web.SLSWebApp.(SLSWebApp.java:82)*{color} {color:#FF} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:465)*{color} {color:#FF} *at org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:164)*{color} {color:#FF} *at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)*{color} {color:#FF} *at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)*{color} {color:#FF} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:261)*{color} {color:#FF} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:403)*{color} {color:#FF} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#FF} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:824)*{color} {color:#FF} *at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:226)*{color} {color:#FF} *at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)*{color} {color:#FF} *at org.apache.hadoop.yarn.sls.SLSRunner.startRM(SLSRunner.java:163)*{color} {color:#FF} *at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:137)*{color} {color:#FF} *at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524)*{color} 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher 19/05/05 11:52:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher 19/05/05 11:52:32 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 19/05/05 11:52:32 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 19/05/05 11:52:32 INFO impl.MetricsSystemImpl: ResourceManager metrics system started 19/05/05 11:52:32 INFO conf.Configuration: found resource capacity-scheduler.xml at file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml 19/05/05 11:52:32 INFO capacity.ParentQueue: root, capacity=1.0, asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, acls=SUBMIT_APPLICATIONS:*ADMINISTER_QUEUE:* 19/05/05 11:52:32 INFO capacity.ParentQueue: Initialized parent-queue root name=root, fullname=root 19/05/05 11:52:32 INFO capacity.LeafQueue: Initializing default capacity = 1.0 [= (float) configuredCapacity / 100 ] asboluteCapacity = 1.0 [= parentAbsoluteCapacity * capacity ] maxCapacity = 1.0 [= configuredMaxCapacity ] absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ] userLimit = 100 [= configuredUserLimit ] userLimitFactor = 1.0 [= configuredUserLimitFactor ] maxApplications = 1 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)]
[jira] [Comment Edited] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833259#comment-16833259 ] NedaMaleki edited comment on YARN-1021 at 5/5/19 8:05 AM: -- *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as YukunTsang:* 19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node a2116.smile.com:3 clusterResource: Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394) at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246) at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141) at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123) ... 4 more *After waiting some minutes I got the following messages and then nothing :(* 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2115.smile.com:0 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2118.smile.com:1 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2117.smile.com:2 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2116.smile.com:3 Timed out after 600 secs 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2115.smile.com:0 clusterResource: 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2118.smile.com:1 clusterResource: 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2117.smile.com:2 clusterResource: 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2116.smile.com:3 clusterResource: *I noticed when it reaches to , it shoots the exception and I do not know why.* *1) I am looking forward to hear from you as I stuck here!* *2) My second question is that, how I can extend SLS? I mean, where shall I write my scheduler code in SLS, run it, and get results? (I need to simulate my scheduler and then compare it with other schedulers like FIFO, Fair, and Capacity)* *Thanks a lot,* *Neda* was (Author: nedamaleki): *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as YukunTsang:* 19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node a2116.smile.com:3 clusterResource: Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394) at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246) at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141) at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123) ... 4 more *After waiting some minutes I got the following messages and then nothing :(* 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2115.smile.com:0 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2118.smile.com:1 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2117.smile.com:2 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2116.smile.com:3 Timed out after 600 secs 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833259#comment-16833259 ] NedaMaleki commented on YARN-1021: -- *Dear Wei Yan,* *I use hadoop 2.4.1. When I want to run SLS, I face with the same problem as YukunTsang:* 19/05/05 11:54:45 INFO capacity.CapacityScheduler: Added node a2116.smile.com:3 clusterResource: Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:394) at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:246) at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:141) at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:524) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123) ... 4 more *After waiting some minutes I got the following messages and then nothing :(* 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2115.smile.com:0 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2118.smile.com:1 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2117.smile.com:2 Timed out after 600 secs 19/05/05 12:06:03 INFO util.AbstractLivelinessMonitor: Expired:a2116.smile.com:3 Timed out after 600 secs 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2115.smile.com:0 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2115.smile.com:0 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2118.smile.com:1 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2118.smile.com:1 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2117.smile.com:2 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2117.smile.com:2 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: Deactivating Node a2116.smile.com:3 as it is now LOST 19/05/05 12:06:03 INFO rmnode.RMNodeImpl: a2116.smile.com:3 Node Transitioned from RUNNING to LOST 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2115.smile.com:0 clusterResource: 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2118.smile.com:1 clusterResource: 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2117.smile.com:2 clusterResource: 19/05/05 12:06:03 INFO capacity.CapacityScheduler: Removed node a2116.smile.com:3 clusterResource: *I noticed when it reaches to , it shoots the exception and I do not know why.* *1) I am looking forward to hear from you as I stuck here!* *2) My second question is that, where I can extend SLS i.e. where shall I write my scheduler code in SLS, run it, and get results? (I need to simulate my scheduler and then compare it with other schedulers like FIFO, Fair, and Capacity)* *Thanks a lot,* *Neda* > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Major > Fix For: 2.3.0 > > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine.
[jira] [Commented] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833253#comment-16833253 ] Hadoop QA commented on YARN-9517: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 5s{color} | {color:red} Docker failed to build yetus/hadoop:749e106. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9517 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967856/YARN-9517-branch-2.8.5.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24053/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 2.8.5, 2.7.7 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517-branch-2.8.5.001.patch, YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833250#comment-16833250 ] Hadoop QA commented on YARN-9518: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.7.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 32s{color} | {color:green} branch-2.7.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} branch-2.7.7 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} branch-2.7.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} branch-2.7.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} branch-2.7.7 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 32m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:06eafee | | JIRA Issue | YARN-9518 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967852/YARN-9518-branch-2.7.7.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 82a24890f60f 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2.7.7 / e446276 | | maven | version: Apache Maven 3.0.5 | | Default Java | 1.7.0_201 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24051/testReport/ | | Max. process+thread count | 155 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24051/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major >
[jira] [Reopened] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai reopened YARN-9517: --- > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833248#comment-16833248 ] Shurong Mai commented on YARN-9518: --- [~Jim_Brennan], I have submitted the patch for branch-2.7.7(the same as 2.7.x, 2.8.x)and the patch for trunck (the same as 2.9.x,3.1.x,3.2.x ) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: YARN-9518-trunk.001.patch > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, > YARN-9518-trunk.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >
[jira] [Comment Edited] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833199#comment-16833199 ] Shurong Mai edited comment on YARN-9518 at 5/5/19 7:00 AM: --- [~Jim_Brennan], thank you for your attention and guidance. I have looked at the source code of version 2.7.x, 2.8.x, 2.9.x, 3.1.x, 3.2.x, they also have the same problem. But the patch can only apply to 2.7.x and 2.8.x, because 2.9.x,3.1.x,3.2.x(the same as trunk) have a little difference in the source code context of patch. So, I need to make another patch for 2.9.x,3.1.x,3.2.x was (Author: shurong.mai): [~Jim_Brennan], thank you for your attention and guidance. I have looked at the source code of version 2.7.x, 2.8.x, 2.9.x, 3.1.x, 3.2.x, they also have the same problem. But the patch can only apply to 2.7.x and 2.8.x, because 2.9.x,3.1.x,3.2.x have a little difference in the source code context of patch. So, I need to make another patch for 2.9.x,3.1.x,3.2.x > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: YARN-9518-branch-2.7.7.001.patch > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: (was: YARN-9518-branch-2.7.7.patch) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.001.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Attachment: YARN-9518-branch-2.7.7.patch > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518-branch-2.7.7.patch, YARN-9518.patch > > > The os version is centos7. > {code:java} > cat /etc/redhat-release > CentOS Linux release 7.3.1611 (Core) > {code} > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated by container-executor as > "/sys/fs/cgroup/cpu" rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is compatible with cgroup path of history os version such as > centos6, centos7 , and universally applicable to cgroup subsystem paths such > as cgroup network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >