Abhey Rana created YARN-11819:
---------------------------------
Summary: Request a HDFS delegation token refresh even at
DelegationTokenRenewerAppSubmitEvent
Key: YARN-11819
URL: https://issues.apache.org/jira/browse/YARN-11819
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 3.3.6, 3.5.0, 3.4.2
Reporter: Abhey Rana
Fix For: 3.5.0, 3.4.2
We observed in our production environment that the jobs submitted with a RM
delegation token were continually failing to start.
Upon further investigation we figured out the following Stack Trace as the
culprit -
{code}
java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN,
Service: ha-hdfs:prod-EMPTY-hbase4a, Ident: (token for sfdc:
HDFS_DELEGATION_TOKEN
owner=hbase/[email protected],
renewer=sfdc, realUser=, issueDate=1744651400720, maxDate=1745256200720,
sequenceNumber=2575348, masterKeyId=790)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:533)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1800(DelegationTokenRenewer.java:83)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1067)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1044)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750){code}
We took a look at the code and figured out that as part of the
handleAppSubmitEvent we do catch the IOException and do request a
DelegationTokenRefresh but we only do that for
DelegationTokenRenewerAppRecoverEvent
Code Pointer -
{code:java}
if (ioe instanceof SecretManager.InvalidToken
&& dttr.maxDate < Time.now()
&& evt instanceof DelegationTokenRenewerAppRecoverEvent
&& token.getKind().equals(HDFS_DELEGATION_KIND)) {
LOG.info("Failed to renew hdfs token " + dttr
+ " on recovery as it expired, requesting new hdfs token for "
+ applicationId + ", user=" + evt.getUser(), ioe);
requestNewHdfsDelegationTokenAsProxyUser(
Arrays.asList(applicationId), evt.getUser(),
evt.shouldCancelAtEnd());
continue;
}{code}
The idea is to add a or statement in the event check.
evt instanceof DelegationTokenRenewerAppRecoverEvent || evt instanceof
DelegationTokenRenewerAppSubmitEvent
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]