Xiaoqiao He created HADOOP-18881:
------------------------------------

             Summary: ZKDTSM could be stuck when meet znode version overflow
                 Key: HADOOP-18881
                 URL: https://issues.apache.org/jira/browse/HADOOP-18881
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Xiaoqiao He
            Assignee: Xiaoqiao He


ZKDTSM could be stuck when meet znode (/zkdtsm/ZKDTSMRoot/ZKDTSMSeqNumRoot) 
version int overflow (2147483647). It can not recovery even restart Application 
which may include YARN Router, DFS Router, KMS and other modules who use 
zookeeper to manage Token. One solution (not very smooth) is delete this znode 
first and then restart Service.

The root cause is following code snippet and curator could not compatible with 
version overflow. I try to give a draft improvement at CURATOR-688. Welcome to 
any discussion if we could resolve it at Hadoop side smooth.

org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager#incrSharedCount

{code:java}
  private int incrSharedCount(SharedCount sharedCount, int batchSize)
      throws Exception {
    while (true) {
      // Loop until we successfully increment the counter
      VersionedValue<Integer> versionedValue = sharedCount.getVersionedValue();
      if (sharedCount.trySetCount(
          versionedValue, versionedValue.getValue() + batchSize)) {
        return versionedValue.getValue();
      }
    }
  }
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to