[ 
https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3104:
-----------------------------
    Summary: RM generates new AMRM tokens every heartbeat between rolling and 
activation  (was: RM continues to send new AMRM tokens every heartbeat between 
rolling and activation)

Yes, the connection is not re-established so the updated token in the client's 
UGI is never re-sent to the RPC server.  Therefore every time the 
RM asks the RPC server for the client's UGI we will continue to get the old 
one.  Since the RM thinks the client is still using the token that was used 
when the connection was established, it continues to regenerate tokens (and 
emit corresponding logs) every heartbeat for the interval between when the new 
key was rolled and when it is activated (i.e.: as long as nextMasterKey != 
null).

To tell whether the client really is using the new token we either need the RPC 
connection to be re-established or a way to tell the RPC layer to 
re-authenticate the connection.  I don't believe there's a good way to do 
either of those given the RPC API, so this patch works around the issue a bit 
by comparing the token we have recorded for the app attempt with the next key.  
It solves the problem of regenerating tokens unnecessarily for the same app 
attempt.  However we will continue to send the token each heartbeat since we 
cannot tell whether the client really has the new token.  I tweaked the summary 
accordingly.

> RM generates new AMRM tokens every heartbeat between rolling and activation
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3104
>                 URL: https://issues.apache.org/jira/browse/YARN-3104
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-3104.001.patch
>
>
> When the RM rolls a new AMRM secret, it conveys this to the AMs when it 
> notices they are still connected with the old key.  However neither the RM 
> nor the AM explicitly close the connection or otherwise try to reconnect with 
> the new secret.  Therefore the RM keeps thinking the AM doesn't have the new 
> token on every heartbeat and keeps sending new tokens for the period between 
> the key roll and the key activation.  Once activated the RM no longer squawks 
> in its logs about needing to generate a new token every heartbeat (i.e.: 
> second) for every app, but the apps can still be using the old token.  The 
> token is only checked upon connection to the RM.  The apps don't reconnect 
> when sent a new token, and the RM doesn't force them to reconnect by closing 
> the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to