[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors

2014-12-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258726#comment-14258726
 ] 

Hudson commented on YARN-2988:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #786 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/786/])
YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: 
rev b256dd76006efbd4bcde3146a642fe0902d83dd2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java


 Graph#save() may leak file descriptors
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2988-001.patch, YARN-2988-002.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors

2014-12-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258734#comment-14258734
 ] 

Hudson commented on YARN-2988:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #52 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/52/])
YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: 
rev b256dd76006efbd4bcde3146a642fe0902d83dd2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java


 Graph#save() may leak file descriptors
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2988-001.patch, YARN-2988-002.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors

2014-12-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258761#comment-14258761
 ] 

Hudson commented on YARN-2988:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1984 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1984/])
YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: 
rev b256dd76006efbd4bcde3146a642fe0902d83dd2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
* hadoop-yarn-project/CHANGES.txt


 Graph#save() may leak file descriptors
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2988-001.patch, YARN-2988-002.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors

2014-12-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258765#comment-14258765
 ] 

Hudson commented on YARN-2988:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #49 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/49/])
YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: 
rev b256dd76006efbd4bcde3146a642fe0902d83dd2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
* hadoop-yarn-project/CHANGES.txt


 Graph#save() may leak file descriptors
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2988-001.patch, YARN-2988-002.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors

2014-12-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258773#comment-14258773
 ] 

Hudson commented on YARN-2988:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #53 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/53/])
YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: 
rev b256dd76006efbd4bcde3146a642fe0902d83dd2)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java


 Graph#save() may leak file descriptors
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2988-001.patch, YARN-2988-002.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2988) Graph#save() may leak file descriptors

2014-12-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258786#comment-14258786
 ] 

Hudson commented on YARN-2988:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2003 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2003/])
YARN-2988. Graph#save() may leak file descriptors. (Ted Yu via ozawa) (ozawa: 
rev b256dd76006efbd4bcde3146a642fe0902d83dd2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
* hadoop-yarn-project/CHANGES.txt


 Graph#save() may leak file descriptors
 --

 Key: YARN-2988
 URL: https://issues.apache.org/jira/browse/YARN-2988
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2988-001.patch, YARN-2988-002.patch


 In 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/Graph.java
  :
 {code}
   public void save(String filepath) throws IOException {
 OutputStreamWriter fout = new OutputStreamWriter(
 new FileOutputStream(filepath), Charset.forName(UTF-8));
 fout.write(generateGraphViz());
 fout.close();
 {code}
 The close of fout should be enclosed in finally clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2958:
---
Attachment: YARN-2958.001.patch

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-25 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2992:
--

 Summary: ZKRMStateStore crashes due to session expiry
 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker


We recently saw the RM crash with the following stacktrace. On session expiry, 
we should gracefully transition to standby. 

{noformat}
2014-12-18 06:28:42,689 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause: 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired 
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
 
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258866#comment-14258866
 ] 

Hadoop QA commented on YARN-2958:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689131/YARN-2958.001.patch
  against trunk revision a164ce2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6190//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6190//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6190//console

This message is automatically generated.

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   

[jira] [Updated] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-25 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2992:
---
Attachment: yarn-2992-1.patch

Patch that retries on session changes as well, also added connection 
re-establishment. 

Again, in the longer term, a Curator-based implementation would be both more 
simple and robust. 

 ZKRMStateStore crashes due to session expiry
 

 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2992-1.patch


 We recently saw the RM crash with the following stacktrace. On session 
 expiry, we should gracefully transition to standby. 
 {noformat}
 2014-12-18 06:28:42,689 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
 at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
 at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2958:
---
Attachment: YARN-2958.002.patch

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258875#comment-14258875
 ] 

Hadoop QA commented on YARN-2992:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689132/yarn-2992-1.patch
  against trunk revision a164ce2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6192//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6192//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6192//console

This message is automatically generated.

 ZKRMStateStore crashes due to session expiry
 

 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2992-1.patch


 We recently saw the RM crash with the following stacktrace. On session 
 expiry, we should gracefully transition to standby. 
 {noformat}
 2014-12-18 06:28:42,689 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
 at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
 at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258876#comment-14258876
 ] 

Hadoop QA commented on YARN-2958:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689133/YARN-2958.002.patch
  against trunk revision a164ce2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6191//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6191//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6191//console

This message is automatically generated.

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in 

[jira] [Commented] (YARN-2992) ZKRMStateStore crashes due to session expiry

2014-12-25 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258881#comment-14258881
 ] 

Rohith commented on YARN-2992:
--

It makes sense to me. +1 for the issue.
And also I would like to bring up the scenario where ZK is not available during 
RM start up. I have observed that RM exits while starting  if ZK is not 
available. Why RM can not be transit to standby?

 ZKRMStateStore crashes due to session expiry
 

 Key: YARN-2992
 URL: https://issues.apache.org/jira/browse/YARN-2992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2992-1.patch


 We recently saw the RM crash with the following stacktrace. On session 
 expiry, we should gracefully transition to standby. 
 {noformat}
 2014-12-18 06:28:42,689 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) 
 at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) 
 at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:930)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1069)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1088)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:927)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:941)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:958)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:687)
  
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService

2014-12-25 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-2993:
-
Summary: Several fixes (missing acl check, error log msg ...) and some 
refinement in AdminService  (was: Several fixes (missing acl check, error log 
...) and some refinement in AdminService)

 Several fixes (missing acl check, error log msg ...) and some refinement in 
 AdminService
 

 Key: YARN-2993
 URL: https://issues.apache.org/jira/browse/YARN-2993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu

 This JIRA is to resolve following issues in 
 {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}:
 *1.* There is no ACLs check for {{refreshServiceAcls}}
 *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can 
 not refresh Admin ACLs. instead of ... Can not refresh user-groups.
 *3.* some unnecessary header import.
 *4.* {code}
 if (!isRMActive()) {
   RMAuditLogger.logFailure(user.getShortUserName(), argName,
   adminAcl.toString(), AdminService,
   ResourceManager is not active. Can not remove labels.);
   throwStandbyException();
 }
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.
 *5.* {code}
 LOG.info(Exception remove labels, ioe);
 RMAuditLogger.logFailure(user.getShortUserName(), argName,
 adminAcl.toString(), AdminService, Exception remove label);
 throw RPCUtil.getRemoteException(ioe);
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2993) Several fixes (missing acl check, error log ...) and some refinement in AdminService

2014-12-25 Thread Yi Liu (JIRA)
Yi Liu created YARN-2993:


 Summary: Several fixes (missing acl check, error log ...) and some 
refinement in AdminService
 Key: YARN-2993
 URL: https://issues.apache.org/jira/browse/YARN-2993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu


This JIRA is to resolve following issues in 
{{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}:

*1.* There is no ACLs check for {{refreshServiceAcls}}

*2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can 
not refresh Admin ACLs. instead of ... Can not refresh user-groups.

*3.* some unnecessary header import.

*4.* {code}
if (!isRMActive()) {
  RMAuditLogger.logFailure(user.getShortUserName(), argName,
  adminAcl.toString(), AdminService,
  ResourceManager is not active. Can not remove labels.);
  throwStandbyException();
}
{code}
is common in lots of methods, just the message is different, we should refine 
it into one common method.

*5.* {code}
LOG.info(Exception remove labels, ioe);
RMAuditLogger.logFailure(user.getShortUserName(), argName,
adminAcl.toString(), AdminService, Exception remove label);
throw RPCUtil.getRemoteException(ioe);
{code}
is common in lots of methods, just the message is different, we should refine 
it into one common method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService

2014-12-25 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-2993:
-
Attachment: YARN-2993.001.patch

Attach the patch to resolve those issues.

 Several fixes (missing acl check, error log msg ...) and some refinement in 
 AdminService
 

 Key: YARN-2993
 URL: https://issues.apache.org/jira/browse/YARN-2993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2993.001.patch


 This JIRA is to resolve following issues in 
 {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}:
 *1.* There is no ACLs check for {{refreshServiceAcls}}
 *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can 
 not refresh Admin ACLs. instead of ... Can not refresh user-groups.
 *3.* some unnecessary header import.
 *4.* {code}
 if (!isRMActive()) {
   RMAuditLogger.logFailure(user.getShortUserName(), argName,
   adminAcl.toString(), AdminService,
   ResourceManager is not active. Can not remove labels.);
   throwStandbyException();
 }
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.
 *5.* {code}
 LOG.info(Exception remove labels, ioe);
 RMAuditLogger.logFailure(user.getShortUserName(), argName,
 adminAcl.toString(), AdminService, Exception remove label);
 throw RPCUtil.getRemoteException(ioe);
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService

2014-12-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258910#comment-14258910
 ] 

Hadoop QA commented on YARN-2993:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12689142/YARN-2993.001.patch
  against trunk revision a164ce2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 15 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6193//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6193//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6193//console

This message is automatically generated.

 Several fixes (missing acl check, error log msg ...) and some refinement in 
 AdminService
 

 Key: YARN-2993
 URL: https://issues.apache.org/jira/browse/YARN-2993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2993.001.patch


 This JIRA is to resolve following issues in 
 {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}:
 *1.* There is no ACLs check for {{refreshServiceAcls}}
 *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can 
 not refresh Admin ACLs. instead of ... Can not refresh user-groups.
 *3.* some unnecessary header import.
 *4.* {code}
 if (!isRMActive()) {
   RMAuditLogger.logFailure(user.getShortUserName(), argName,
   adminAcl.toString(), AdminService,
   ResourceManager is not active. Can not remove labels.);
   throwStandbyException();
 }
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.
 *5.* {code}
 LOG.info(Exception remove labels, ioe);
 RMAuditLogger.logFailure(user.getShortUserName(), argName,
 adminAcl.toString(), AdminService, Exception remove label);
 throw RPCUtil.getRemoteException(ioe);
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258920#comment-14258920
 ] 

Varun Saxena commented on YARN-2958:


Test failure unrelated. Passing in local. Findbugs to be addressed by YARN-2938

 RMStateStore seems to unnecessarily and wronly store sequence number 
 separately
 ---

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2993) Several fixes (missing acl check, error log msg ...) and some refinement in AdminService

2014-12-25 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258929#comment-14258929
 ] 

Yi Liu commented on YARN-2993:
--

Findbugs failure is *not related*, also test failure is *not related* 
(https://issues.apache.org/jira/browse/YARN-2991).
The patch is direct, no need test case.

 Several fixes (missing acl check, error log msg ...) and some refinement in 
 AdminService
 

 Key: YARN-2993
 URL: https://issues.apache.org/jira/browse/YARN-2993
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2993.001.patch


 This JIRA is to resolve following issues in 
 {{org.apache.hadoop.yarn.server.resourcemanager.AdminService}}:
 *1.* There is no ACLs check for {{refreshServiceAcls}}
 *2.* log message in {{refreshAdminAcls}} is incorrect, it should be ... Can 
 not refresh Admin ACLs. instead of ... Can not refresh user-groups.
 *3.* some unnecessary header import.
 *4.* {code}
 if (!isRMActive()) {
   RMAuditLogger.logFailure(user.getShortUserName(), argName,
   adminAcl.toString(), AdminService,
   ResourceManager is not active. Can not remove labels.);
   throwStandbyException();
 }
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.
 *5.* {code}
 LOG.info(Exception remove labels, ioe);
 RMAuditLogger.logFailure(user.getShortUserName(), argName,
 adminAcl.toString(), AdminService, Exception remove label);
 throw RPCUtil.getRemoteException(ioe);
 {code}
 is common in lots of methods, just the message is different, we should refine 
 it into one common method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2014-12-25 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-2807:
---
Attachment: YARN-2807.1.patch

I attached the patch fixes usage of haadmin options. I changed the order of 
options and args of {{-transitionToActive}} and {{-transitionToStandby}} too. I 
think options before args is natural though it does not depend on the order.

 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Priority: Critical
 Attachments: YARN-2807.1.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)