[jira] [Updated] (YARN-10557) Application may be leaked in state store when resourcemanager failover.

2020-12-30 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-10557:
---
Description: 
In resourceManager log, I found amount of log like below: 
{code}
2020-12-30 19:18:48,120 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of 
completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, 
but not removing app application_1608912003714_0098 from state store as log 
aggregation have not finished yet.
{code}

When I search this, I found the application has already log aggerated. When I 
debug this, I found the app's logAggregationStatusForAppReport is NOT_START. 
(Note: In my test cluster, I simulate restart rm occasionally)

If the application is finished and log aggerated, but not removed from rm. When 
rm failover, the new rm will recover from state store (you know log aggregation 
is not stored, so can't remove it), but logAggregationStatusForAppReport will 
not be updated. So logAggregationStatusForAppReport keep NOT_START. Then the 
app will not be removed from statestore. 



  was:
In resourceManager log, I found amount of log like below: 
{code}
2020-12-30 19:18:48,120 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of 
completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, 
but not removing app application_1608912003714_0098 from state store as log 
aggregation have not finished yet.
{code}

When I search this, I found the application has already log aggerated. When I 
debug this, I found the app's logAggregationStatusForAppReport is NOT_START. 
(Note: In my test cluster, I simulate restart rm occasionally)

If the application is finished and log aggerated, but not removed from rm. When 
rm failover, the new rm will recover from state store, but 
logAggregationStatusForAppReport will not be updated. So 
logAggregationStatusForAppReport keep NOT_START. Then the app will not be 
removed from statestore. 




> Application may be leaked in state store when resourcemanager failover.
> ---
>
> Key: YARN-10557
> URL: https://issues.apache.org/jira/browse/YARN-10557
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: resourcemanager
> Fix For: 3.3.1
>
>
> In resourceManager log, I found amount of log like below: 
> {code}
> 2020-12-30 19:18:48,120 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of 
> completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, 
> but not removing app application_1608912003714_0098 from state store as log 
> aggregation have not finished yet.
> {code}
> When I search this, I found the application has already log aggerated. When I 
> debug this, I found the app's logAggregationStatusForAppReport is NOT_START. 
> (Note: In my test cluster, I simulate restart rm occasionally)
> If the application is finished and log aggerated, but not removed from rm. 
> When rm failover, the new rm will recover from state store (you know log 
> aggregation is not stored, so can't remove it), but 
> logAggregationStatusForAppReport will not be updated. So 
> logAggregationStatusForAppReport keep NOT_START. Then the app will not be 
> removed from statestore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10557) Application may be leaked in state store when resourcemanager failover.

2020-12-30 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-10557:
---
Component/s: (was: RM)
 resourcemanager

> Application may be leaked in state store when resourcemanager failover.
> ---
>
> Key: YARN-10557
> URL: https://issues.apache.org/jira/browse/YARN-10557
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: resourcemanager
> Fix For: 3.3.1
>
>
> In resourceManager log, I found amount of log like below: 
> {code}
> 2020-12-30 19:18:48,120 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of 
> completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, 
> but not removing app application_1608912003714_0098 from state store as log 
> aggregation have not finished yet.
> {code}
> When I search this, I found the application has already log aggerated. When I 
> debug this, I found the app's logAggregationStatusForAppReport is NOT_START. 
> (Note: In my test cluster, I simulate restart rm occasionally)
> If the application is finished and log aggerated, but not removed from rm. 
> When rm failover, the new rm will recover from state store, but 
> logAggregationStatusForAppReport will not be updated. So 
> logAggregationStatusForAppReport keep NOT_START. Then the app will not be 
> removed from statestore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10557) Application may be leaked in state store when resourcemanager failover.

2020-12-30 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-10557:
---
Labels: resourcemanager  (was: )

> Application may be leaked in state store when resourcemanager failover.
> ---
>
> Key: YARN-10557
> URL: https://issues.apache.org/jira/browse/YARN-10557
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: resourcemanager
> Fix For: 3.3.1
>
>
> In resourceManager log, I found amount of log like below: 
> {code}
> 2020-12-30 19:18:48,120 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of 
> completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, 
> but not removing app application_1608912003714_0098 from state store as log 
> aggregation have not finished yet.
> {code}
> When I search this, I found the application has already log aggerated. When I 
> debug this, I found the app's logAggregationStatusForAppReport is NOT_START. 
> (Note: In my test cluster, I simulate restart rm occasionally)
> If the application is finished and log aggerated, but not removed from rm. 
> When rm failover, the new rm will recover from state store, but 
> logAggregationStatusForAppReport will not be updated. So 
> logAggregationStatusForAppReport keep NOT_START. Then the app will not be 
> removed from statestore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10557) Application may be leaked in state store when resourcemanager failover.

2020-12-30 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-10557:
---
Component/s: RM

> Application may be leaked in state store when resourcemanager failover.
> ---
>
> Key: YARN-10557
> URL: https://issues.apache.org/jira/browse/YARN-10557
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
>
> In resourceManager log, I found amount of log like below: 
> {code}
> 2020-12-30 19:18:48,120 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of 
> completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, 
> but not removing app application_1608912003714_0098 from state store as log 
> aggregation have not finished yet.
> {code}
> When I search this, I found the application has already log aggerated. When I 
> debug this, I found the app's logAggregationStatusForAppReport is NOT_START. 
> (Note: In my test cluster, I simulate restart rm occasionally)
> If the application is finished and log aggerated, but not removed from rm. 
> When rm failover, the new rm will recover from state store, but 
> logAggregationStatusForAppReport will not be updated. So 
> logAggregationStatusForAppReport keep NOT_START. Then the app will not be 
> removed from statestore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10557) Application may be leaked in state store when resourcemanager failover.

2020-12-30 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-10557:
---
Fix Version/s: 3.3.1

> Application may be leaked in state store when resourcemanager failover.
> ---
>
> Key: YARN-10557
> URL: https://issues.apache.org/jira/browse/YARN-10557
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Priority: Major
> Fix For: 3.3.1
>
>
> In resourceManager log, I found amount of log like below: 
> {code}
> 2020-12-30 19:18:48,120 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of 
> completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, 
> but not removing app application_1608912003714_0098 from state store as log 
> aggregation have not finished yet.
> {code}
> When I search this, I found the application has already log aggerated. When I 
> debug this, I found the app's logAggregationStatusForAppReport is NOT_START. 
> (Note: In my test cluster, I simulate restart rm occasionally)
> If the application is finished and log aggerated, but not removed from rm. 
> When rm failover, the new rm will recover from state store, but 
> logAggregationStatusForAppReport will not be updated. So 
> logAggregationStatusForAppReport keep NOT_START. Then the app will not be 
> removed from statestore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10557) Application may be leaked in state store when resourcemanager failover.

2020-12-30 Thread zhengchenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-10557:
---
Affects Version/s: 3.2.1

> Application may be leaked in state store when resourcemanager failover.
> ---
>
> Key: YARN-10557
> URL: https://issues.apache.org/jira/browse/YARN-10557
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>
> In resourceManager log, I found amount of log like below: 
> {code}
> 2020-12-30 19:18:48,120 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Max number of 
> completed apps kept in state store met: maxCompletedAppsInStateStore = 2000, 
> but not removing app application_1608912003714_0098 from state store as log 
> aggregation have not finished yet.
> {code}
> When I search this, I found the application has already log aggerated. When I 
> debug this, I found the app's logAggregationStatusForAppReport is NOT_START. 
> (Note: In my test cluster, I simulate restart rm occasionally)
> If the application is finished and log aggerated, but not removed from rm. 
> When rm failover, the new rm will recover from state store, but 
> logAggregationStatusForAppReport will not be updated. So 
> logAggregationStatusForAppReport keep NOT_START. Then the app will not be 
> removed from statestore. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org