[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Varun Saxena updated YARN-2958: ------------------------------- Attachment: YARN-2958.002.patch > RMStateStore seems to unnecessarily and wronly store sequence number > separately > ------------------------------------------------------------------------------- > > Key: YARN-2958 > URL: https://issues.apache.org/jira/browse/YARN-2958 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Zhijie Shen > Assignee: Varun Saxena > Priority: Blocker > Attachments: YARN-2958.001.patch, YARN-2958.002.patch > > > It seems that RMStateStore updates last sequence number when storing or > updating each individual DT, to recover the latest sequence number when RM > restarting. > First, the current logic seems to be problematic: > {code} > public synchronized void updateRMDelegationTokenAndSequenceNumber( > RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, > int latestSequenceNumber) { > if(isFencedState()) { > LOG.info("State store is in Fenced state. Can't update RM Delegation > Token."); > return; > } > try { > updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, > renewDate, > latestSequenceNumber); > } catch (Exception e) { > notifyStoreOperationFailed(e); > } > } > {code} > {code} > @Override > protected void updateStoredToken(RMDelegationTokenIdentifier id, > long renewDate) { > try { > LOG.info("updating RMDelegation token with sequence number: " > + id.getSequenceNumber()); > rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, > renewDate, id.getSequenceNumber()); > } catch (Exception e) { > LOG.error("Error in updating persisted RMDelegationToken with sequence > number: " > + id.getSequenceNumber()); > ExitUtil.terminate(1, e); > } > } > {code} > According to code above, even when renewing a DT, the last sequence number is > updated in the store, which is wrong. For example, we have the following > sequence: > 1. Get DT 1 (seq = 1) > 2. Get DT 2( seq = 2) > 3. Renew DT 1 (seq = 1) > 4. Restart RM > The stored and then recovered last sequence number is 1. It makes the next > created DT after RM restarting will conflict with DT 2 on sequence num. > Second, the aforementioned bug doesn't happen actually, because the recovered > last sequence num has been overwritten at by the correctly one. > {code} > public void recover(RMState rmState) throws Exception { > LOG.info("recovering RMDelegationTokenSecretManager."); > // recover RMDTMasterKeys > for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() > .getMasterKeyState()) { > addKey(dtKey); > } > // recover RMDelegationTokens > Map<RMDelegationTokenIdentifier, Long> rmDelegationTokens = > rmState.getRMDTSecretManagerState().getTokenState(); > this.delegationTokenSequenceNumber = > rmState.getRMDTSecretManagerState().getDTSequenceNumber(); > for (Map.Entry<RMDelegationTokenIdentifier, Long> entry : > rmDelegationTokens > .entrySet()) { > addPersistedDelegationToken(entry.getKey(), entry.getValue()); > } > } > {code} > The code above recovers delegationTokenSequenceNumber by reading the last > sequence number in the store. It could be wrong. Fortunately, > delegationTokenSequenceNumber updates it to the right number. > {code} > if (identifier.getSequenceNumber() > getDelegationTokenSeqNum()) { > setDelegationTokenSeqNum(identifier.getSequenceNumber()); > } > {code} > All the stored identifiers will be gone through, and > delegationTokenSequenceNumber will be set to the largest sequence number > among these identifiers. Therefore, new DT will be assigned a sequence number > which is always larger than that of all the recovered DT. > To sum up, two negatives make a positive, but it's good to fix the issue. > Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)