[
https://issues.apache.org/jira/browse/YARN-11776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shilun Fan reassigned YARN-11776:
---------------------------------
Assignee: Abhey Rana
> Handle NPE in the RMDelegationTokenIdentifier if localServiceAddress is null
> ----------------------------------------------------------------------------
>
> Key: YARN-11776
> URL: https://issues.apache.org/jira/browse/YARN-11776
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.4.1
> Reporter: Abhey Rana
> Assignee: Abhey Rana
> Priority: Major
> Labels: pull-request-available
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> We observed in our production environment that the jobs submitted with a RM
> delegation token were continually failing after the RM failover took place.
> Upon further investigation we figured out the following Stack Trace as the
> culprit -
> {code:java}
> 2025-02-24 11:23:21,511 WARN [DelegationTokenRenewer #400699]
> security.DelegationTokenRenewer - Unable to add the application to the
> delegation token renewer.
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:144)
> at
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:97)
> at org.apache.hadoop.security.token.Token.renew(Token.java:500)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:661)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:658)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:657)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:519)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1800(DelegationTokenRenewer.java:83)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1067)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1044)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750){code}
> We anticipate that there's an issue the way localServiceAddress is
> instantiated due to the internal network issue.
> However, In our humble opinoin we should add a null check for this variable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]