[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263602#comment-14263602 ]
Varun Saxena commented on YARN-2958: ------------------------------------ [~zjshen], thanks for the review. Please find my replies below. bq. No need to add isUpdateSeqNo. Updating a non-existing znode is storing a DT, we should update the seq number of it. So we just need to use isUpdate The reason I added this new flag is because when we update the Delegation token, we first check whether znode exists or not. And if it doesnt exist we store it as a new token(not update it). In this case, I think we should not overwrite the sequence number. Now I am not sure if non existence of znode while updating DT is a valid use case(could not think of any) or just defensive programming but anyhow we store DT if znode to be updated is not found.. Refer to code below. {code:title=ZKRMStateStore.java} protected synchronized void updateRMDelegationTokenAndSequenceNumberInternal( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) throws Exception { ........... if (existsWithRetries(nodeRemovePath, true) == null) { // in case znode doesn't exist addStoreOrUpdateOps( opList, rmDTIdentifier, renewDate, false, false); LOG.debug("Attempted to update a non-existing znode " + nodeRemovePath); } else { // in case znode exists addStoreOrUpdateOps( opList, rmDTIdentifier, renewDate, true, false); } ...... } {code} bq. store|updateRMDelegationTokenAndSequenceNumber is better to be renamed to store|updateRMDelegationToken Ok. Will change. bq. Instead of changing sequenceNumber to 0, can we set 1111 to dtId1 and verify it later? Will do so. > RMStateStore seems to unnecessarily and wrongly store sequence number > separately > -------------------------------------------------------------------------------- > > Key: YARN-2958 > URL: https://issues.apache.org/jira/browse/YARN-2958 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Zhijie Shen > Assignee: Varun Saxena > Priority: Blocker > Attachments: YARN-2958.001.patch, YARN-2958.002.patch, > YARN-2958.003.patch > > > It seems that RMStateStore updates last sequence number when storing or > updating each individual DT, to recover the latest sequence number when RM > restarting. > First, the current logic seems to be problematic: > {code} > public synchronized void updateRMDelegationTokenAndSequenceNumber( > RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, > int latestSequenceNumber) { > if(isFencedState()) { > LOG.info("State store is in Fenced state. Can't update RM Delegation > Token."); > return; > } > try { > updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, > renewDate, > latestSequenceNumber); > } catch (Exception e) { > notifyStoreOperationFailed(e); > } > } > {code} > {code} > @Override > protected void updateStoredToken(RMDelegationTokenIdentifier id, > long renewDate) { > try { > LOG.info("updating RMDelegation token with sequence number: " > + id.getSequenceNumber()); > rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, > renewDate, id.getSequenceNumber()); > } catch (Exception e) { > LOG.error("Error in updating persisted RMDelegationToken with sequence > number: " > + id.getSequenceNumber()); > ExitUtil.terminate(1, e); > } > } > {code} > According to code above, even when renewing a DT, the last sequence number is > updated in the store, which is wrong. For example, we have the following > sequence: > 1. Get DT 1 (seq = 1) > 2. Get DT 2( seq = 2) > 3. Renew DT 1 (seq = 1) > 4. Restart RM > The stored and then recovered last sequence number is 1. It makes the next > created DT after RM restarting will conflict with DT 2 on sequence num. > Second, the aforementioned bug doesn't happen actually, because the recovered > last sequence num has been overwritten at by the correctly one. > {code} > public void recover(RMState rmState) throws Exception { > LOG.info("recovering RMDelegationTokenSecretManager."); > // recover RMDTMasterKeys > for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() > .getMasterKeyState()) { > addKey(dtKey); > } > // recover RMDelegationTokens > Map<RMDelegationTokenIdentifier, Long> rmDelegationTokens = > rmState.getRMDTSecretManagerState().getTokenState(); > this.delegationTokenSequenceNumber = > rmState.getRMDTSecretManagerState().getDTSequenceNumber(); > for (Map.Entry<RMDelegationTokenIdentifier, Long> entry : > rmDelegationTokens > .entrySet()) { > addPersistedDelegationToken(entry.getKey(), entry.getValue()); > } > } > {code} > The code above recovers delegationTokenSequenceNumber by reading the last > sequence number in the store. It could be wrong. Fortunately, > delegationTokenSequenceNumber updates it to the right number. > {code} > if (identifier.getSequenceNumber() > getDelegationTokenSeqNum()) { > setDelegationTokenSeqNum(identifier.getSequenceNumber()); > } > {code} > All the stored identifiers will be gone through, and > delegationTokenSequenceNumber will be set to the largest sequence number > among these identifiers. Therefore, new DT will be assigned a sequence number > which is always larger than that of all the recovered DT. > To sum up, two negatives make a positive, but it's good to fix the issue. > Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)