[ https://issues.apache.org/jira/browse/YARN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe updated YARN-3104: ----------------------------- Summary: RM generates new AMRM tokens every heartbeat between rolling and activation (was: RM continues to send new AMRM tokens every heartbeat between rolling and activation) Yes, the connection is not re-established so the updated token in the client's UGI is never re-sent to the RPC server. Therefore every time the RM asks the RPC server for the client's UGI we will continue to get the old one. Since the RM thinks the client is still using the token that was used when the connection was established, it continues to regenerate tokens (and emit corresponding logs) every heartbeat for the interval between when the new key was rolled and when it is activated (i.e.: as long as nextMasterKey != null). To tell whether the client really is using the new token we either need the RPC connection to be re-established or a way to tell the RPC layer to re-authenticate the connection. I don't believe there's a good way to do either of those given the RPC API, so this patch works around the issue a bit by comparing the token we have recorded for the app attempt with the next key. It solves the problem of regenerating tokens unnecessarily for the same app attempt. However we will continue to send the token each heartbeat since we cannot tell whether the client really has the new token. I tweaked the summary accordingly. > RM generates new AMRM tokens every heartbeat between rolling and activation > --------------------------------------------------------------------------- > > Key: YARN-3104 > URL: https://issues.apache.org/jira/browse/YARN-3104 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: Jason Lowe > Assignee: Jason Lowe > Attachments: YARN-3104.001.patch > > > When the RM rolls a new AMRM secret, it conveys this to the AMs when it > notices they are still connected with the old key. However neither the RM > nor the AM explicitly close the connection or otherwise try to reconnect with > the new secret. Therefore the RM keeps thinking the AM doesn't have the new > token on every heartbeat and keeps sending new tokens for the period between > the key roll and the key activation. Once activated the RM no longer squawks > in its logs about needing to generate a new token every heartbeat (i.e.: > second) for every app, but the apps can still be using the old token. The > token is only checked upon connection to the RM. The apps don't reconnect > when sent a new token, and the RM doesn't force them to reconnect by closing > the connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)