[jira] [Updated] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4041: - Fix Version/s: 2.8.0 > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Fix For: 2.8.0, 2.7.2, 3.0.0-alpha1 > > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4041: -- Attachment: 0005-YARN-4041.patch Yes [~jlowe], we can compare with token itself and do wait in smaller units. With new patch, I kept a total wait time of 1sec but with 10ms units. Locally test run seems more faster. Uploading a new patch. > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4041: -- Attachment: 0004-YARN-4041.patch Hi [~jlowe] and [~jianhe] Pls find an updated patch. I made a correction in test case to wait for {{renewerService}} thread pool executor to process the renew event raised. Kindly share your thoughts. > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4041: -- Attachment: 0003-YARN-4041.patch Updating patch after test case fix. > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4041: -- Attachment: 0002-YARN-4041.patch Thank you [~jianhe] and [~jlowe]. As per latest jenkins, patch needs rebase. Attaching a rebased version. Tests are passing locally. > Slow delegation token renewal can severely prolong RM recovery > -- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Sunil G > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4041) Slow delegation token renewal can severely prolong RM recovery
[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4041: -- Attachment: 0001-YARN-4041.patch Uploading an initial version of work in progress patch where token renewal is made as asynchronous. Used {{DelegationTokenRenewerRunnable}} to achieve the same. Slow delegation token renewal can severely prolong RM recovery -- Key: YARN-4041 URL: https://issues.apache.org/jira/browse/YARN-4041 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Sunil G Attachments: 0001-YARN-4041.patch When the RM does a work-preserving restart it synchronously tries to renew delegation tokens for every active application. If a token server happens to be down or is running slow and a lot of the active apps were using tokens from that server then it can have a huge impact on the time it takes the RM to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)