[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073416#comment-17073416
 ] 

Fang Liu commented on YARN-9768:
--------------------------------

[~maniraj...@gmail.com] thanks for the patch, it is really helpful.

I have one question, the method getTimerTask will new 
DelegationTokenRenewerAppRecoverEvent only, while this function actually could 
be called when submitting a new app (called through addApplicationAsync) and 
recovering an app (called through addApplicationAsyncDuringRecovery). The 
exception handling for a new app and recovering an existing app are different:
 * For submitting a new app, handleDTRenewerAppSubmitEvent will be called. If 
throwable happens, the app will be rejected.
 * For recovering an existing app, handleDTRenewerAppRecoverEvent will be 
called. If throwable happens, it will only log a warn message. 

Therefore, should getTimerTask check evt instance type and new 
DelegationTokenRenewerAppSubmitEvent or DelegationTokenRenewerAppRecoverEvent 
accordingly?

[~pgolash] FYI

> RM Renew Delegation token thread should timeout and retry
> ---------------------------------------------------------
>
>                 Key: YARN-9768
>                 URL: https://issues.apache.org/jira/browse/YARN-9768
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: CR Hota
>            Assignee: Manikandan R
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch, YARN-9768.010.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to