Jim Brennan created YARN-10348: ---------------------------------- Summary: Allow RM to always cancel tokens after app completes Key: YARN-10348 URL: https://issues.apache.org/jira/browse/YARN-10348 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.1.3, 2.10.0 Reporter: Jim Brennan Assignee: Jim Brennan
(Note: this change was originally done on our internal branch by [~daryn]). The RM currently has an option for a client to specify disabling token cancellation when a job completes. This feature was an initial attempt to address the use case of a job launching sub-jobs (ie. oozie launcher) and the original job finishing prior to the sub-job(s) completion - ex. original job completion triggered premature cancellation of tokens needed by the sub-jobs. Many years ago, [~daryn] added a more robust implementation to ref count tokens ([YARN-3055]). This prevented premature cancellation of the token until all apps using the token complete, and invalidated the need for a client to specify cancel=false. Unfortunately the config option was not removed. We have seen cases where oozie "java actions" and some users were explicitly disabling token cancellation. This can lead to a buildup of defunct tokens that may overwhelm the ZK buffer used by the KDC's backing store. At which point the KMS fails to connect to ZK and is unable to issue/validate new tokens - rendering the KDC only able to authenticate pre-existing tokens. Production incidents have occurred due to the buffer size issue. To avoid these issues, the RM should have the option to ignore/override the client's request to not cancel tokens. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org