[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors
[ https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258726#comment-14258726 ] Hudson commented on YARN-2988: -- FAILURE: Integrated in Hadoop-Yarn-trunk #786 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/786/]) YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: rev b256dd76006efbd4bcde3146a642fe0902d83dd2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java Graph#save() may leak file descriptors -- Key: YARN-2988 URL: https://issues.apache.org/jira/browse/YARN-2988 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.7.0 Attachments: YARN-2988-001.patch, YARN-2988-002.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java : {code} public void save(String filepath) throws IOException { OutputStreamWriter fout = new OutputStreamWriter( new FileOutputStream(filepath), Charset.forName(UTF-8)); fout.write(generateGraphViz()); fout.close(); {code} The close of fout should be enclosed in finally clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors
[ https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258734#comment-14258734 ] Hudson commented on YARN-2988: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #52 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/52/]) YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: rev b256dd76006efbd4bcde3146a642fe0902d83dd2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java Graph#save() may leak file descriptors -- Key: YARN-2988 URL: https://issues.apache.org/jira/browse/YARN-2988 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.7.0 Attachments: YARN-2988-001.patch, YARN-2988-002.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java : {code} public void save(String filepath) throws IOException { OutputStreamWriter fout = new OutputStreamWriter( new FileOutputStream(filepath), Charset.forName(UTF-8)); fout.write(generateGraphViz()); fout.close(); {code} The close of fout should be enclosed in finally clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors
[ https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258761#comment-14258761 ] Hudson commented on YARN-2988: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1984 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1984/]) YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: rev b256dd76006efbd4bcde3146a642fe0902d83dd2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java * hadoop-yarn-project/CHANGES.txt Graph#save() may leak file descriptors -- Key: YARN-2988 URL: https://issues.apache.org/jira/browse/YARN-2988 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.7.0 Attachments: YARN-2988-001.patch, YARN-2988-002.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java : {code} public void save(String filepath) throws IOException { OutputStreamWriter fout = new OutputStreamWriter( new FileOutputStream(filepath), Charset.forName(UTF-8)); fout.write(generateGraphViz()); fout.close(); {code} The close of fout should be enclosed in finally clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors
[ https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258765#comment-14258765 ] Hudson commented on YARN-2988: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #49 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/49/]) YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: rev b256dd76006efbd4bcde3146a642fe0902d83dd2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java * hadoop-yarn-project/CHANGES.txt Graph#save() may leak file descriptors -- Key: YARN-2988 URL: https://issues.apache.org/jira/browse/YARN-2988 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.7.0 Attachments: YARN-2988-001.patch, YARN-2988-002.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java : {code} public void save(String filepath) throws IOException { OutputStreamWriter fout = new OutputStreamWriter( new FileOutputStream(filepath), Charset.forName(UTF-8)); fout.write(generateGraphViz()); fout.close(); {code} The close of fout should be enclosed in finally clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors
[ https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258773#comment-14258773 ] Hudson commented on YARN-2988: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #53 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/53/]) YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: rev b256dd76006efbd4bcde3146a642fe0902d83dd2) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java Graph#save() may leak file descriptors -- Key: YARN-2988 URL: https://issues.apache.org/jira/browse/YARN-2988 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.7.0 Attachments: YARN-2988-001.patch, YARN-2988-002.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java : {code} public void save(String filepath) throws IOException { OutputStreamWriter fout = new OutputStreamWriter( new FileOutputStream(filepath), Charset.forName(UTF-8)); fout.write(generateGraphViz()); fout.close(); {code} The close of fout should be enclosed in finally clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors
[ https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258786#comment-14258786 ] Hudson commented on YARN-2988: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2003 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2003/]) YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: rev b256dd76006efbd4bcde3146a642fe0902d83dd2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java * hadoop-yarn-project/CHANGES.txt Graph#save() may leak file descriptors -- Key: YARN-2988 URL: https://issues.apache.org/jira/browse/YARN-2988 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.7.0 Attachments: YARN-2988-001.patch, YARN-2988-002.patch In hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java : {code} public void save(String filepath) throws IOException { OutputStreamWriter fout = new OutputStreamWriter( new FileOutputStream(filepath), Charset.forName(UTF-8)); fout.write(generateGraphViz()); fout.close(); {code} The close of fout should be enclosed in finally clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2958: --- Attachment: YARN-2958.001.patch RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2992) ZKRMStateStore crashes due to session expiry
Karthik Kambatla created YARN-2992: -- Summary: ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258866#comment-14258866 ] Hadoop QA commented on YARN-2958: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689131/YARN-2958.001.patch against trunk revision a164ce2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6190//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6190//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6190//console This message is automatically generated. RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) {
[jira] [Updated] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2992: --- Attachment: yarn-2992-1.patch Patch that retries on session changes as well, also added connection re-establishment. Again, in the longer term, a Curator-based implementation would be both more simple and robust. ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2992-1.patch We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2958: --- Attachment: YARN-2958.002.patch RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258875#comment-14258875 ] Hadoop QA commented on YARN-2992: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689132/yarn-2992-1.patch against trunk revision a164ce2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6192//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6192//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6192//console This message is automatically generated. ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2992-1.patch We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258876#comment-14258876 ] Hadoop QA commented on YARN-2958: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689133/YARN-2958.002.patch against trunk revision a164ce2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6191//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6191//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6191//console This message is automatically generated. RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in
[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry
[ https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258881#comment-14258881 ] Rohith commented on YARN-2992: -- It makes sense to me. +1 for the issue. And also I would like to bring up the scenario where ZK is not available during RM start up. I have observed that RM exits while starting if ZK is not available. Why RM can not be transit to standby? ZKRMStateStore crashes due to session expiry Key: YARN-2992 URL: https://issues.apache.org/jira/browse/YARN-2992 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-2992-1.patch We recently saw the RM crash with the following stacktrace. On session expiry, we should gracefully transition to standby. {noformat} 2014-12-18 06:28:42,689 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2993: - Summary: Several fixes (missing acl check, error log msg ...) and some refinement in AdminService (was: Several fixes (missing acl check, error log ...) and some refinement in AdminService) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2993) Several fixes (missing acl check, error log ...) and some refinement in AdminService
Yi Liu created YARN-2993: Summary: Several fixes (missing acl check, error log ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2993: - Attachment: YARN-2993.001.patch Attach the patch to resolve those issues. Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258910#comment-14258910 ] Hadoop QA commented on YARN-2993: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12689142/YARN-2993.001.patch against trunk revision a164ce2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 15 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6193//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6193//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6193//console This message is automatically generated. Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
[ https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258920#comment-14258920 ] Varun Saxena commented on YARN-2958: Test failure unrelated. Passing in local. Findbugs to be addressed by YARN-2938 RMStateStore seems to unnecessarily and wronly store sequence number separately --- Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen Assignee: Varun Saxena Priority: Blocker Attachments: YARN-2958.001.patch, YARN-2958.002.patch It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService
[ https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258929#comment-14258929 ] Yi Liu commented on YARN-2993: -- Findbugs failure is *not related*, also test failure is *not related* (https://issues.apache.org/jira/browse/YARN-2991). The patch is direct, no need test case. Several fixes (missing acl check, error log msg ...) and some refinement in AdminService Key: YARN-2993 URL: https://issues.apache.org/jira/browse/YARN-2993 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2993.001.patch This JIRA is to resolve following issues in {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}: *1.* There is no ACLs check for {{refreshServiceAcls}} *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can not refresh Admin ACLs. instead of ... Can not refresh user-groups. *3.* some unnecessary header import. *4.* {code} if (!isRMActive()) { RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, ResourceManager is not active. Can not remove labels.); throwStandbyException(); } {code} is common in lots of methods, just the message is different, we should refine it into one common method. *5.* {code} LOG.info(Exception remove labels, ioe); RMAuditLogger.logFailure(user.getShortUserName(), argName, adminAcl.toString(), AdminService, Exception remove label); throw RPCUtil.getRemoteException(ioe); {code} is common in lots of methods, just the message is different, we should refine it into one common method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-2807: --- Attachment: YARN-2807.1.patch I attached the patch fixes usage of haadmin options. I changed the order of options and args of {{-transitionToActive}} and {{-transitionToStandby}} too. I think options before args is natural though it does not depend on the order. Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Priority: Critical Attachments: YARN-2807.1.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)