[ https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengchenyu updated YARN-10557: ------------------------------- Component/s: RM > Application may be leaked in state store when resourcemanager failover. > ----------------------------------------------------------------------- > > Key: YARN-10557 > URL: https://issues.apache.org/jira/browse/YARN-10557 > Project: Hadoop YARN > Issue Type: Bug > Components: RM > Affects Versions: 3.2.1 > Reporter: zhengchenyu > Priority: Major > Fix For: 3.3.1 > > > In resourceManager log, I found amount of log like below: > {code} > 2020-12-30 19:18:48,120 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of > completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, > but not removing app application_1608912003714_0098 from state store as log > aggregation have not finished yet. > {code} > When I search this, I found the application has already log aggerated. When I > debug this, I found the app's logAggregationStatusForAppReport is NOT_START. > (Note: In my test cluster, I simulate restart rm occasionally) > If the application is finished and log aggerated, but not removed from rm. > When rm failover, the new rm will recover from state store, but > logAggregationStatusForAppReport will not be updated. So > logAggregationStatusForAppReport keep NOT_START. Then the app will not be > removed from statestore. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org