[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264823#comment-14264823
 ] 

Zhijie Shen commented on YARN-2958:
-----------------------------------

bq. And if it doesnt exist we store it as a new token(not update it). In this 
case, I think we should not overwrite the sequence number.

If you take a look at the old addStoreOrUpdateOps. Storing DT and writing last 
sequence number is put in the same opList, hence both are executed or neither. 
If znode doesn't exist when updating, we suspet DT is not written, and neither 
does the sequence number.

> RMStateStore seems to unnecessarily and wrongly store sequence number 
> separately
> --------------------------------------------------------------------------------
>
>                 Key: YARN-2958
>                 URL: https://issues.apache.org/jira/browse/YARN-2958
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Zhijie Shen
>            Assignee: Varun Saxena
>            Priority: Blocker
>         Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
> YARN-2958.003.patch
>
>
> It seems that RMStateStore updates last sequence number when storing or 
> updating each individual DT, to recover the latest sequence number when RM 
> restarting.
> First, the current logic seems to be problematic:
> {code}
>   public synchronized void updateRMDelegationTokenAndSequenceNumber(
>       RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
>       int latestSequenceNumber) {
>     if(isFencedState()) {
>       LOG.info("State store is in Fenced state. Can't update RM Delegation 
> Token.");
>       return;
>     }
>     try {
>       updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
> renewDate,
>           latestSequenceNumber);
>     } catch (Exception e) {
>       notifyStoreOperationFailed(e);
>     }
>   }
> {code}
> {code}
>   @Override
>   protected void updateStoredToken(RMDelegationTokenIdentifier id,
>       long renewDate) {
>     try {
>       LOG.info("updating RMDelegation token with sequence number: "
>           + id.getSequenceNumber());
>       rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
>         renewDate, id.getSequenceNumber());
>     } catch (Exception e) {
>       LOG.error("Error in updating persisted RMDelegationToken with sequence 
> number: "
>             + id.getSequenceNumber());
>       ExitUtil.terminate(1, e);
>     }
>   }
> {code}
> According to code above, even when renewing a DT, the last sequence number is 
> updated in the store, which is wrong. For example, we have the following 
> sequence:
> 1. Get DT 1 (seq = 1)
> 2. Get DT 2( seq = 2)
> 3. Renew DT 1 (seq = 1)
> 4. Restart RM
> The stored and then recovered last sequence number is 1. It makes the next 
> created DT after RM restarting will conflict with DT 2 on sequence num.
> Second, the aforementioned bug doesn't happen actually, because the recovered 
> last sequence num has been overwritten at by the correctly one.
> {code}
>   public void recover(RMState rmState) throws Exception {
>     LOG.info("recovering RMDelegationTokenSecretManager.");
>     // recover RMDTMasterKeys
>     for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
>       .getMasterKeyState()) {
>       addKey(dtKey);
>     }
>     // recover RMDelegationTokens
>     Map<RMDelegationTokenIdentifier, Long> rmDelegationTokens =
>         rmState.getRMDTSecretManagerState().getTokenState();
>     this.delegationTokenSequenceNumber =
>         rmState.getRMDTSecretManagerState().getDTSequenceNumber();
>     for (Map.Entry<RMDelegationTokenIdentifier, Long> entry : 
> rmDelegationTokens
>       .entrySet()) {
>       addPersistedDelegationToken(entry.getKey(), entry.getValue());
>     }
>   }
> {code}
> The code above recovers delegationTokenSequenceNumber by reading the last 
> sequence number in the store. It could be wrong. Fortunately, 
> delegationTokenSequenceNumber updates it to the right number.
> {code}
>     if (identifier.getSequenceNumber() > getDelegationTokenSeqNum()) {
>       setDelegationTokenSeqNum(identifier.getSequenceNumber());
>     }
> {code}
> All the stored identifiers will be gone through, and 
> delegationTokenSequenceNumber will be set to the largest sequence number 
> among these identifiers. Therefore, new DT will be assigned a sequence number 
> which is always larger than that of all the recovered DT.
> To sum up, two negatives make a positive, but it's good to fix the issue. 
> Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to