[jira] [Resolved] (YARN-10557) Application may be leaked in state store when resourcemanager failover.
[ https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu resolved YARN-10557. Release Note: YARN-9848 Resolution: Duplicate I think it duplicate with YARN-9848. > Application may be leaked in state store when resourcemanager failover. > --- > > Key: YARN-10557 > URL: https://issues.apache.org/jira/browse/YARN-10557 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: resourcemanager > Fix For: 3.3.1 > > > In resourceManager log, I found amount of log like below: > {code} > 2020-12-30 19:18:48,120 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of > completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, > but not removing app application_1608912003714_0098 from state store as log > aggregation have not finished yet. > {code} > When I search this, I found the application has already log aggerated. When I > debug this, I found the app's logAggregationStatusForAppReport is NOT_START. > (Note: In my test cluster, I simulate restart rm occasionally) > If the application is finished and log aggerated, but not removed from rm. > When rm failover, the new rm will recover from state store (you know log > aggregation is not stored, so can't remove it), but > logAggregationStatusForAppReport will not be updated. So > logAggregationStatusForAppReport keep NOT_START. Then the app will not be > removed from statestore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10557) Application may be leaked in state store when resourcemanager failover.
zhengchenyu created YARN-10557: -- Summary: Application may be leaked in state store when resourcemanager failover. Key: YARN-10557 URL: https://issues.apache.org/jira/browse/YARN-10557 Project: Hadoop YARN Issue Type: Bug Reporter: zhengchenyu In resourceManager log, I found amount of log like below: {code} 2020-12-30 19:18:48,120 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, but not removing app application_1608912003714_0098 from state store as log aggregation have not finished yet. {code} When I search this, I found the application has already log aggerated. When I debug this, I found the app's logAggregationStatusForAppReport is NOT_START. (Note: In my test cluster, I simulate restart rm occasionally) If the application is finished and log aggerated, but not removed from rm. When rm failover, the new rm will recover from state store, but logAggregationStatusForAppReport will not be updated. So logAggregationStatusForAppReport keep NOT_START. Then the app will not be removed from statestore. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/ No changes -1 overall The following subsystems voted -1: pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml Failed junit tests : hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks hadoop.hdfs.server.federation.router.TestRouterAllResolver hadoop.hdfs.server.federation.router.TestRouterMountTableCacheRefreshSecure hadoop.hdfs.server.federation.router.TestRouterRpc hadoop.hdfs.server.federation.router.TestSafeMode hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.tools.dynamometer.TestDynamometerInfra hadoop.tools.dynamometer.TestDynamometerInfra cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/diff-compile-cc-root.txt [48K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/diff-compile-javac-root.txt [564K] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/diff-checkstyle-root.txt [16M] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/pathlen.txt [12K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/diff-patch-pylint.txt [60K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/diff-patch-shelldocs.txt [44K] whitespace: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/whitespace-eol.txt [13M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/whitespace-tabs.txt [2.0M] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/diff-javadoc-javadoc-root.txt [2.0M] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [344K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [92K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt [100K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt [548K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer_hadoop-dynamometer-infra.txt [8.0K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/371/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt [24K] Powered by Apache Yetus 0.12.0 https://yetus.apache.org - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10556) Web-app server does not work for V2 timeline
Ahmed Hussein created YARN-10556: Summary: Web-app server does not work for V2 timeline Key: YARN-10556 URL: https://issues.apache.org/jira/browse/YARN-10556 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Ahmed Hussein {{TestDistributedShell}} for timeline version 2.0 shows the following errors in the log files, with the below exception. There is a previous YARN-3087 that added a fix to the same issue before. There is a need to investigate whether it is a testing issue or it the error has resurfaced. {code:bash} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline/clusters/yarn_cluster/apps/application_1609346161655_0001: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:247) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:155) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:152) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1702) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at
[jira] [Created] (YARN-10555) missing security check before getAppAttempts
lujie created YARN-10555: Summary: missing security check before getAppAttempts Key: YARN-10555 URL: https://issues.apache.org/jira/browse/YARN-10555 Project: Hadoop YARN Issue Type: Bug Reporter: lujie -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10551) non-admin user can change the log level
[ https://issues.apache.org/jira/browse/YARN-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie resolved YARN-10551. -- Resolution: Not A Problem misconfiguration! seeĀ https://issues.apache.org/jira/secure/attachment/12832635/HADOOP-13707.001.patch > non-admin user can change the log level > --- > > Key: YARN-10551 > URL: https://issues.apache.org/jira/browse/YARN-10551 > Project: Hadoop YARN > Issue Type: Bug >Reporter: lujie >Priority: Major > > reproduce: > 1. login as user1 and do > {code:java} > yarn daemonlog -setlevel hadoop11:8088 > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl DEBUG > {code} > 2. login as user2 and run wordcount > 3. check the log of RM > {code:java} > 2020-12-27 10:54:15,917 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Processing > event for application_1609065586411_0003 of type START > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org