[ https://issues.apache.org/jira/browse/YARN-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sunil G updated YARN-4041: -------------------------- Attachment: 0005-YARN-4041.patch Yes [~jlowe], we can compare with token itself and do wait in smaller units. With new patch, I kept a total wait time of 1sec but with 10ms units. Locally test run seems more faster. Uploading a new patch. > Slow delegation token renewal can severely prolong RM recovery > -------------------------------------------------------------- > > Key: YARN-4041 > URL: https://issues.apache.org/jira/browse/YARN-4041 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: Jason Lowe > Assignee: Sunil G > Attachments: 0001-YARN-4041.patch, 0002-YARN-4041.patch, > 0003-YARN-4041.patch, 0004-YARN-4041.patch, 0005-YARN-4041.patch > > > When the RM does a work-preserving restart it synchronously tries to renew > delegation tokens for every active application. If a token server happens to > be down or is running slow and a lot of the active apps were using tokens > from that server then it can have a huge impact on the time it takes the RM > to process the restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)