Abhey Rana created YARN-11776:
---------------------------------
Summary: Handle NPE in the RMDelegationTokenIdentifier if
localServiceAddress is null
Key: YARN-11776
URL: https://issues.apache.org/jira/browse/YARN-11776
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 3.4.1
Reporter: Abhey Rana
We observed in our production environment that the jobs submitted with a RM
delegation token were continually failing after the RM failover took place.
Upon further investigation we figured out the following Stack Trace as the
culprit -
{code:java}
2025-02-24 11:23:21,511 WARN [DelegationTokenRenewer #400699]
security.DelegationTokenRenewer - Unable to add the application to the
delegation token renewer.
java.lang.NullPointerException
at
org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:144)
at
org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:97)
at org.apache.hadoop.security.token.Token.renew(Token.java:500)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:661)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:658)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:657)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:519)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1800(DelegationTokenRenewer.java:83)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1067)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1044)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750){code}
We anticipate that there's an issue the way localServiceAddress is instantiated
due to the internal network issue.
However, In our humble opinoin we should add a null check for this variable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]