[ https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157442#comment-17157442 ]
Jim Brennan commented on YARN-10348: ------------------------------------ I put up a patch for branch-3.2 and I verified that it applies cleanly, builds and the test passes for branch-3.1 and branch-2.10 as well. > Allow RM to always cancel tokens after app completes > ---------------------------------------------------- > > Key: YARN-10348 > URL: https://issues.apache.org/jira/browse/YARN-10348 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Affects Versions: 2.10.0, 3.1.3 > Reporter: Jim Brennan > Assignee: Jim Brennan > Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10348-branch-3.2.001.patch, YARN-10348.001.patch, > YARN-10348.002.patch > > > (Note: this change was originally done on our internal branch by [~daryn]). > The RM currently has an option for a client to specify disabling token > cancellation when a job completes. This feature was an initial attempt to > address the use case of a job launching sub-jobs (ie. oozie launcher) and the > original job finishing prior to the sub-job(s) completion - ex. original job > completion triggered premature cancellation of tokens needed by the sub-jobs. > Many years ago, [~daryn] added a more robust implementation to ref count > tokens ([YARN-3055]). This prevented premature cancellation of the token > until all apps using the token complete, and invalidated the need for a > client to specify cancel=false. Unfortunately the config option was not > removed. > We have seen cases where oozie "java actions" and some users were explicitly > disabling token cancellation. This can lead to a buildup of defunct tokens > that may overwhelm the ZK buffer used by the KDC's backing store. At which > point the KMS fails to connect to ZK and is unable to issue/validate new > tokens - rendering the KDC only able to authenticate pre-existing tokens. > Production incidents have occurred due to the buffer size issue. > To avoid these issues, the RM should have the option to ignore/override the > client's request to not cancel tokens. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org