[ 
https://issues.apache.org/jira/browse/YARN-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390005#comment-14390005
 ] 

zhihai xu commented on YARN-3429:
---------------------------------

I find out the test failure reason is because the configuration 
RM_AMRM_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS and RM_AM_EXPIRY_INTERVAL_MS is 
changed to a smaller value at testMasterKeyRollOver. And the same configuration 
is used by testTokenExpiry, if launching AM container takes  more time than  19 
seconds = RM_AMRM_TOKEN_MASTER_KEY_ROLLING_INTERVAL_SECS(13) + 
RM_AM_EXPIRY_INTERVAL_MS(4) *1.5.
This failure will happen.

The following logs can prove it:
{code}
2015-03-31 12:25:56,869 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEvent.EventType:
 LAUNCH
2015-03-31 12:25:56,871 INFO  [pool-20-thread-1] 
security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for 
ApplicationAttempt: appattempt_1427804754787_0001_000001
2015-03-31 12:25:56,871 INFO  [pool-20-thread-1] 
security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:createPassword(307)) - Creating password for 
appattempt_1427804754787_0001_000001
2015-03-31 12:26:06,859 INFO  [main] security.TestAMRMTokens 
(TestAMRMTokens.java:testTokenExpiry(129)) - Waiting for AM Launch to happen..
2015-03-31 12:26:07,788 INFO  [Timer-24] security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:rollMasterKey(142)) - Rolling master-key for 
amrm-tokens
2015-03-31 12:26:12,861 INFO  [main] security.TestAMRMTokens 
(TestAMRMTokens.java:testTokenExpiry(129)) - Waiting for AM Launch to happen..
2015-03-31 12:26:13,790 INFO  [Timer-24] security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:activateNextMasterKey(166)) - Activating next 
master key with id: 391474792
2015-03-31 12:26:15,931 DEBUG [pool-20-thread-1] security.SecurityUtil 
(SecurityUtil.java:setTokenService(360)) - Acquired token Kind: 
YARN_AM_RM_TOKEN, Service: asf902.gq1.ygridcore.net:8030, Ident: (appAttemptId 
{ application_id { id: 1 cluster_timestamp: 1427804754787 } attemptId: 1 } 
keyId: 391474791)
2015-03-31 12:26:15,932 INFO  [pool-20-thread-1] amlauncher.AMLauncher 
(AMLauncher.java:launch(127)) - Done launching container Container: 
[ContainerId: container_1427804754787_0001_01_000001, NodeId: localhost:1234, 
NodeHttpAddress: localhost:2, Resource: <memory:1024, vCores:1>, Priority: 0, 
Token: Token { kind: ContainerToken, service: localhost:1234 }, ] for AM 
appattempt_1427804754787_0001_000001
{code}

The above log show:
Rolling master-key for amrm-tokens happened at 2015-03-31 12:26:07
after 6 seconds(2015-03-31 12:26:13),  Activating next master key with id: 
391474792(activateNextMasterKey) happened,
the keyId for the token used is 391474791, You can see the used keyId 391474791 
is just rolled over.
The following is the code of activateNextMasterKey:
You can see nextMasterKey is set null and  currentMasterKey is changed to 
391474792
{code}
  public void activateNextMasterKey() {
    this.writeLock.lock();
    try {
      LOG.info("Activating next master key with id: "
          + this.nextMasterKey.getMasterKey().getKeyId());
      this.currentMasterKey = this.nextMasterKey;
      this.nextMasterKey = null;
      AMRMTokenSecretManagerState state =
          AMRMTokenSecretManagerState.newInstance(
            this.currentMasterKey.getMasterKey(), null);
      rmContext.getStateStore()
          .storeOrUpdateAMRMTokenSecretManager(state, true);
    } finally {
      this.writeLock.unlock();
    }
  }
{code} 

The master key roll over cause the following InvalidToken exception from 
retrievePassword.
{code}
Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid 
AMRMToken from appattempt_1427804754787_0001_000001
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager.retrievePassword(AMRMTokenSecretManager.java:263)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager.retrievePassword(AMRMTokenSecretManager.java:56)
        at 
org.apache.hadoop.security.token.SecretManager.retriableRetrievePassword(SecretManager.java:91)
        at 
org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:277)
        at 
org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:304)
        at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
{code}

Based on the code of retrievePassword, 
the used keyId 391474791 doesn't match currentMasterKey which has keyId 
391474792 and also nextMasterKey is null,
so InvalidToken("Invalid AMRMToken from " + applicationAttemptId) is thrown.
{code}
  public byte[] retrievePassword(AMRMTokenIdentifier identifier)
      throws InvalidToken {
    this.readLock.lock();
    try {
      ApplicationAttemptId applicationAttemptId =
          identifier.getApplicationAttemptId();
      if (LOG.isDebugEnabled()) {
        LOG.debug("Trying to retrieve password for " + applicationAttemptId);
      }
      if (!appAttemptSet.contains(applicationAttemptId)) {
        throw new InvalidToken(applicationAttemptId
            + " not found in AMRMTokenSecretManager.");
      }
      if (identifier.getKeyId() == this.currentMasterKey.getMasterKey()
        .getKeyId()) {
        return createPassword(identifier.getBytes(),
          this.currentMasterKey.getSecretKey());
      } else if (nextMasterKey != null
          && identifier.getKeyId() == this.nextMasterKey.getMasterKey()
            .getKeyId()) {
        return createPassword(identifier.getBytes(),
          this.nextMasterKey.getSecretKey());
      }
      throw new InvalidToken("Invalid AMRMToken from " + applicationAttemptId);
    } finally {
      this.readLock.unlock();
    }
  }
{code}

> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken from appattempt_1427804754787_0001_000001
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3429
>                 URL: https://issues.apache.org/jira/browse/YARN-3429
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>
> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken from appattempt_1427804754787_0001_000001
> The error logs is at 
> https://builds.apache.org/job/PreCommit-YARN-Build/7172//testReport/org.apache.hadoop.yarn.server.resourcemanager.security/TestAMRMTokens/testTokenExpiry_1_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to