Abhey Rana created YARN-11776:
---------------------------------

             Summary: Handle NPE in the RMDelegationTokenIdentifier if 
localServiceAddress is null
                 Key: YARN-11776
                 URL: https://issues.apache.org/jira/browse/YARN-11776
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 3.4.1
            Reporter: Abhey Rana


We observed in our production environment that the jobs submitted with a RM 
delegation token were continually failing after the RM failover took place.

Upon further investigation we figured out the following Stack Trace as the 
culprit -


{code:java}
2025-02-24 11:23:21,511 WARN  [DelegationTokenRenewer #400699] 
security.DelegationTokenRenewer - Unable to add the application to the 
delegation token renewer.
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:144)
        at 
org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:97)
        at org.apache.hadoop.security.token.Token.renew(Token.java:500)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:661)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:658)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:657)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:519)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1800(DelegationTokenRenewer.java:83)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:1067)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:1044)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}

We anticipate that there's an issue the way localServiceAddress is instantiated 
due to the internal network issue.

However, In our humble opinoin we should add a null check for this variable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to