[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3010:
-
Description: 
A new findbug issues reported recently: 
{quote}
IS  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
 locked 91% of time
{quote}
https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html

  was:A new findbug issues reported recently: 
https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor

 A new findbug issues reported recently: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3010:
-
Attachment: YARN-3010.001.patch

 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Attachments: YARN-3010.001.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance

2015-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265760#comment-14265760
 ] 

Hadoop QA commented on YARN-2996:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690261/YARN-2996.002.patch
  against trunk revision 4cd66f7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6249//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6249//console

This message is automatically generated.

 Refine some fs operations in FileSystemRMStateStore to improve performance
 --

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch, YARN-2996.002.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-05 Thread Yi Liu (JIRA)
Yi Liu created YARN-3010:


 Summary: Fix recent findbug issue in AbstractYarnScheduler
 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor


A new findbug issues reported recently: 
https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3010:
-
Description: 
A new findbug issues reported recently in latest trunk: 
{quote}
IS  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
 locked 91% of time
{quote}
https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html

  was:
A new findbug issues reported recently: 
{quote}
IS  Inconsistent synchronization of 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
 locked 91% of time
{quote}
https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor

 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3004) Fix missed synchronization in MemoryRMStateStore

2015-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264437#comment-14264437
 ] 

Hadoop QA commented on YARN-3004:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690050/YARN-3004.001.patch
  against trunk revision 21c6f01.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6242//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6242//console

This message is automatically generated.

 Fix missed synchronization in MemoryRMStateStore
 

 Key: YARN-3004
 URL: https://issues.apache.org/jira/browse/YARN-3004
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3004.001.patch


 In {{MemoryRMStateStore}}, obviously {{state}} variable should be 
 thread-safe, so we need to add _synchronized_ for
 {code}
 storeApplicationStateInternal
 updateApplicationStateInternal
 storeOrUpdateAMRMTokenSecretManagerState
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2684) FairScheduler should tolerate queue configuration changes across RM restarts

2015-01-05 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264500#comment-14264500
 ] 

Rohith commented on YARN-2684:
--

IIUC, to simulate queueplacement policy should be changed.I am able to simulate 
same using rule name *reject*. Basically, as I seen fairscheduler code no other 
rule rejects applications other then only rule *reject*. Will upload patch to 
fix the issue soon.

 FairScheduler should tolerate queue configuration changes across RM restarts
 

 Key: YARN-2684
 URL: https://issues.apache.org/jira/browse/YARN-2684
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Rohith
Priority: Critical

 YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3004) Fix missed synchronization in MemoryRMStateStore

2015-01-05 Thread Yi Liu (JIRA)
Yi Liu created YARN-3004:


 Summary: Fix missed synchronization in MemoryRMStateStore
 Key: YARN-3004
 URL: https://issues.apache.org/jira/browse/YARN-3004
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu


In {{MemoryRMStateStore}}, obviously {{state}} variable should be thread-safe, 
so we need to add _synchronized_ for
{code}
storeApplicationStateInternal
updateApplicationStateInternal
storeOrUpdateAMRMTokenSecretManagerState
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3004) Fix missed synchronization in MemoryRMStateStore

2015-01-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-3004:
-
Attachment: YARN-3004.001.patch

 Fix missed synchronization in MemoryRMStateStore
 

 Key: YARN-3004
 URL: https://issues.apache.org/jira/browse/YARN-3004
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-3004.001.patch


 In {{MemoryRMStateStore}}, obviously {{state}} variable should be 
 thread-safe, so we need to add _synchronized_ for
 {code}
 storeApplicationStateInternal
 updateApplicationStateInternal
 storeOrUpdateAMRMTokenSecretManagerState
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh

2015-01-05 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264371#comment-14264371
 ] 

Jeff Zhang commented on YARN-3000:
--

It makes sense to make YARN_PID_DIR as deprecated if HADOOP_PID_DIR is used for 
yarn in trunk. 

 YARN_PID_DIR should be visible in yarn-env.sh
 -

 Key: YARN-3000
 URL: https://issues.apache.org/jira/browse/YARN-3000
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.6.0
Reporter: Jeff Zhang
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3000.patch


 Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed 
 not the place for user to set up enviroment variable. IMO, yarn-env.sh is the 
 place for users to set up enviroment variable just like hadoop-env.sh, so 
 it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment 
 just like YARN_RESOURCEMANAGER_HEAPSIZE )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh

2015-01-05 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264373#comment-14264373
 ] 

Jeff Zhang commented on YARN-3000:
--

BTW, what's the jira for making HADOOP_PID_DIR to replace YARN_PID_DIR ? As I 
know hadoop-2.6 didn't do that. 

 YARN_PID_DIR should be visible in yarn-env.sh
 -

 Key: YARN-3000
 URL: https://issues.apache.org/jira/browse/YARN-3000
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.6.0
Reporter: Jeff Zhang
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3000.patch


 Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed 
 not the place for user to set up enviroment variable. IMO, yarn-env.sh is the 
 place for users to set up enviroment variable just like hadoop-env.sh, so 
 it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment 
 just like YARN_RESOURCEMANAGER_HEAPSIZE )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3002) YARN documentation needs updating post-shell rewrite

2015-01-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264502#comment-14264502
 ] 

Steve Loughran commented on YARN-3002:
--

+1

 YARN documentation needs updating post-shell rewrite
 

 Key: YARN-3002
 URL: https://issues.apache.org/jira/browse/YARN-3002
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
 Attachments: YARN-3002-00.patch


 After HADOOP-9902, the YARN documentation is out of date. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2684) FairScheduler should tolerate queue configuration changes across RM restarts

2015-01-05 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264517#comment-14264517
 ] 

Rohith commented on YARN-2684:
--

Attached the patch fixinng this issue. Kindly review the patch.

 FairScheduler should tolerate queue configuration changes across RM restarts
 

 Key: YARN-2684
 URL: https://issues.apache.org/jira/browse/YARN-2684
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2684.patch


 YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2684) FairScheduler should tolerate queue configuration changes across RM restarts

2015-01-05 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2684:
-
Attachment: 0001-YARN-2684.patch

 FairScheduler should tolerate queue configuration changes across RM restarts
 

 Key: YARN-2684
 URL: https://issues.apache.org/jira/browse/YARN-2684
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2684.patch


 YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2625) Problems with CLASSPATH in Job Submission REST API

2015-01-05 Thread Doug Haigh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264577#comment-14264577
 ] 

Doug Haigh commented on YARN-2625:
--

I am not using MR. I am writing my own AppMaster which requires knowing the 
path to the Hadoop ecosystem's CLASSPATH. Unless you expect Hadoop to be 
rewritten in something other than JAVA, most AppMaster jars written in Java 
will be required to know the CLASSPATH of the Hadoop ecosystem they are 
expected to run under. If you want to add a REST API to specifically get the 
CLASSPATH of the Hadoop ecosystem a Java AppMaster will run under, that is fine 
with me.

 Problems with CLASSPATH in Job Submission REST API
 --

 Key: YARN-2625
 URL: https://issues.apache.org/jira/browse/YARN-2625
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.5.1
Reporter: Doug Haigh

 There are a couple of issues I have found specifying the CLASSPATH 
 environment variable using the REST API.
 1) In the Java client, the CLASSPATH environment is usually made up of either 
 the value of the yarn.application.classpath in yarn-site.xml value or the 
 default YARN classpath value as defined by 
 YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH. REST API 
 consumers have no method of telling the resource manager to use the default 
 unless they hardcode the default value themselves. If the default ever 
 changes, the code would need to change. 
 2) If any environment variables are used in the CLASSPATH environment 'value' 
 field, they are evaluated when the values are NULL resulting in bad values in 
 the CLASSPATH. For example, if I had hardcoded the CLASSPATH value to the 
 default of $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, 
 $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, 
 $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, 
 $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, 
 $HADOOP_YARN_HOME/share/hadoop/yarn/*, 
 $HADOOP_YARN_HOME/share/hadoop/yarn/lib/* the classpath passed to the 
 application master is 
 :/share/hadoop/common/*:/share/hadoop/common/lib/*:/share/hadoop/hdfs/*:/share/hadoop/hdfs/lib/*:/share/hadoop/yarn/*:/share/hadoop/yarn/lib/*
 These two problems require REST API consumers to always have the fully 
 resolved path defined in the yarn.application.classpath value. If the 
 property is missing or contains environment varaibles, the application 
 created by the REST API will fail due to the CLASSPATH being incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2684) FairScheduler should tolerate queue configuration changes across RM restarts

2015-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264581#comment-14264581
 ] 

Hadoop QA commented on YARN-2684:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690068/0001-YARN-2684.patch
  against trunk revision 21c6f01.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6243//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6243//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6243//console

This message is automatically generated.

 FairScheduler should tolerate queue configuration changes across RM restarts
 

 Key: YARN-2684
 URL: https://issues.apache.org/jira/browse/YARN-2684
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2684.patch


 YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page

2015-01-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264567#comment-14264567
 ] 

Karthik Kambatla commented on YARN-2360:


Thanks for noticing the missing commit, Peng. Just committed to branch-2. It 
should be part of 2.7. 

 Fair Scheduler: Display dynamic fair share for queues on the scheduler page
 ---

 Key: YARN-2360
 URL: https://issues.apache.org/jira/browse/YARN-2360
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Fix For: 2.6.0

 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
 Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
 YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, 
 yarn-2360-6.patch


 Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
 share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page

2015-01-05 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2360:
---
Fix Version/s: (was: 2.6.0)
   2.7.0

 Fair Scheduler: Display dynamic fair share for queues on the scheduler page
 ---

 Key: YARN-2360
 URL: https://issues.apache.org/jira/browse/YARN-2360
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Fix For: 2.7.0

 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
 Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
 YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, 
 yarn-2360-6.patch


 Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
 share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info

2015-01-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264589#comment-14264589
 ] 

Varun Saxena commented on YARN-2978:


Kindly review. 

 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1177) Support automatic failover using ZKFC

2015-01-05 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned YARN-1177:
---

Assignee: Akira AJISAKA  (was: Wei Yan)

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Akira AJISAKA
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-01-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3005:
---
Assignee: (was: Varun Saxena)

 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Akira AJISAKA
Priority: Trivial
  Labels: newbie

 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-01-05 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created YARN-3005:
---

 Summary: [JDK7] Use switch statement for String instead of if-else 
statement in RegistrySecurity.java
 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Akira AJISAKA
Priority: Trivial


Since we have moved to JDK7, we can refactor the below if-else statement for 
String.
{code}
// TODO JDK7 SWITCH
if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
  access = AccessPolicy.sasl;
} else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
  access = AccessPolicy.digest;
} else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
  access = AccessPolicy.anon;
} else {
  throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
  + \ + auth + \);
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1177) Support automatic failover using ZKFC

2015-01-05 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264785#comment-14264785
 ] 

Akira AJISAKA commented on YARN-1177:
-

Hi [~ywskycn], how is the issue going? I want to take it over.
Automatic failover has been already supported, however, when executing graceful 
failover from command line, UnsupportedOperationException is thrown.
{code:title=RMHAServiceTarget.java}
  @Override
  public InetSocketAddress getZKFCAddress() {
// TODO (YARN-1177): ZKFC implementation
throw new UnsupportedOperationException(RMHAServiceTarget doesn't have  +
a corresponding ZKFC address);
  }
{code}

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Wei Yan
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1177) Support automatic failover using ZKFC

2015-01-05 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264789#comment-14264789
 ] 

Wei Yan commented on YARN-1177:
---

[~ajisakaa], feel free to grab. thanks.

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Wei Yan
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-01-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3005:
--

Assignee: Varun Saxena

 [JDK7] Use switch statement for String instead of if-else statement in 
 RegistrySecurity.java
 

 Key: YARN-3005
 URL: https://issues.apache.org/jira/browse/YARN-3005
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Varun Saxena
Priority: Trivial
  Labels: newbie

 Since we have moved to JDK7, we can refactor the below if-else statement for 
 String.
 {code}
 // TODO JDK7 SWITCH
 if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
   access = AccessPolicy.sasl;
 } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
   access = AccessPolicy.digest;
 } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
   access = AccessPolicy.anon;
 } else {
   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
   + \ + auth + \);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer

2015-01-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265163#comment-14265163
 ] 

Karthik Kambatla commented on YARN-2919:


[~Naganarasimha] - I am on vacation and will not be able to look into this at 
least until next week. Sorry for the inconvenience. 

 Potential race between renew and cancel in DelegationTokenRenwer 
 -

 Key: YARN-2919
 URL: https://issues.apache.org/jira/browse/YARN-2919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Naganarasimha G R
Priority: Critical
 Attachments: YARN-2919.20141209-1.patch


 YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a 
 race because of which a renewal in flight isn't interrupted by a cancel. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled

2015-01-05 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265172#comment-14265172
 ] 

Masatake Iwasaki commented on YARN-3006:


Is YARN-2807 relevant?

 Improve the error message when attempting manual failover with auto-failover 
 enabled
 

 Key: YARN-3006
 URL: https://issues.apache.org/jira/browse/YARN-3006
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor

 When executing manual failover with automatic failover enabled, 
 UnsupportedOperationException is thrown.
 {code}
 # yarn rmadmin -failover rm1 rm2
 Exception in thread main java.lang.UnsupportedOperationException: 
 RMHAServiceTarget doesn't have a corresponding ZKFC address
   at 
 org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51)
   at 
 org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94)
   at 
 org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311)
   at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622)
 {code}
 I'm thinking the above message is confusing to users. (Users may think 
 whether ZKFC is configured correctly...) The command should output error 
 message to stderr instead of throwing Exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265186#comment-14265186
 ] 

Jian He commented on YARN-2230:
---

[~kawaa], thanks for your patch!
looks good overall. Would mind fixing the doc for the memory and also for the 
min-allocation too ? thx

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2360) Fair Scheduler: Display dynamic fair share for queues on the scheduler page

2015-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265208#comment-14265208
 ] 

Hudson commented on YARN-2360:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6809 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6809/])
Move YARN-2360 from 2.6 to 2.7 in CHANGES.txt (kasha: rev 
41d72cbd48e6df7be3d177eaf04d73e88cf38381)
* hadoop-yarn-project/CHANGES.txt


 Fair Scheduler: Display dynamic fair share for queues on the scheduler page
 ---

 Key: YARN-2360
 URL: https://issues.apache.org/jira/browse/YARN-2360
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Fix For: 2.7.0

 Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
 Screen_Shot_v3.png, Screen_Shot_v4.png, Screen_Shot_v5.png, YARN-2360-v1.txt, 
 YARN-2360-v2.txt, YARN-2360-v3.patch, YARN-2360-v4.patch, YARN-2360-v5.patch, 
 yarn-2360-6.patch


 Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
 share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler

2015-01-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265259#comment-14265259
 ] 

Karthik Kambatla commented on YARN-2881:


I can remove the TODO at commit time. 

+1 for the patch, committing it now..

 Implement PlanFollower for FairScheduler
 

 Key: YARN-2881
 URL: https://issues.apache.org/jira/browse/YARN-2881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, 
 YARN-2881.002.patch, YARN-2881.003.patch, YARN-2881.004.patch, 
 YARN-2881.005.patch, YARN-2881.006.patch, YARN-2881.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-05 Thread Vijay Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265276#comment-14265276
 ] 

Vijay Bhat commented on YARN-2230:
--

Jian,

I (Vijay Bhat) can definitely take care of that. Also, since this is my first 
submission, I wanted to clarify - is the protocol that I assign the JIRA to 
myself once I submit the patch? Apologies for any confusion. Thanks!


 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3008) FairScheduler: Use lock for queuemanager instead of synchronized on FairScheduler

2015-01-05 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-3008:
---

 Summary: FairScheduler: Use lock for queuemanager instead of 
synchronized on FairScheduler
 Key: YARN-3008
 URL: https://issues.apache.org/jira/browse/YARN-3008
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot


Instead of a big monolithic lock on FairScheduler, we can have an explicit lock 
on queuemanager and revisit all synchronized methods in FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265254#comment-14265254
 ] 

Karthik Kambatla commented on YARN-2217:


Can we also add error cases to the tests. I understand the methods are very 
simple, but it would be nice to catch changes in behavior. 

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info

2015-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265261#comment-14265261
 ] 

Hadoop QA commented on YARN-2978:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690124/YARN-2978.004.patch
  against trunk revision dfd2589.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6246//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6246//console

This message is automatically generated.

 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch, YARN-2978.004.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services

2015-01-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265310#comment-14265310
 ] 

Zhijie Shen commented on YARN-2427:
---

Thanks for the patch, Varun! Some comments:

1. Is GET operation necessary for queue info alone? We have already provided 
getting app API, which contains the queue information. It may be arguable that 
this response is smaller in terms of message size. However, full app meta-data 
should not be that bad, shouldn't it?
{code}
+  @GET
+  @Path(/apps/{appid}/queue)
+  @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
+  public AppQueue getAppQueue(@Context HttpServletRequest hsr,
+  @PathParam(appid) String appId) throws AuthorizationException {
{code}

2. I assume the similar logic should be done in 
ClientRMService#moveApplicationAcrossQueues. If not, that method needs to be 
fixed. Here, we may not want to do the sanity check twice.
{code}
+String userName = callerUGI.getUserName();
+RMApp app = null;
+try {
+  app = getRMAppForAppId(appId);
+} catch (NotFoundException e) {
+  RMAuditLogger.logFailure(userName, AuditConstants.KILL_APP_REQUEST,
+UNKNOWN, RMWebService, Trying to move an absent application 
++ appId);
+  throw e;
+}
+
+if (!app.getQueue().equals(targetQueue.getQueue())) {
+  // user is attempting to change queue.
+  return moveApp(app, callerUGI, targetQueue.getQueue());
+}
{code}

3. If we avoided the logic in (2), we may have to handle 
ApplicationNotFoundException from ClientRMService#moveApplicationAcrossQueues 
and map it to NotFoundException around the following code.
{code}
+  if (ue.getCause() instanceof YarnException) {
+YarnException ye = (YarnException) ue.getCause();
{code}

4. Make it protected static?
{code}
+  String appQueueToJSON(AppQueue targetQueue) throws Exception {
{code}

5. JAXBContextResolver needs to add AppQueue.

6. Is the code change in TestFifoScheduler and testAppSubmit necessary?

 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, 
 apache-yarn-2427.2.patch, apache-yarn-2427.3.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2574) Add support for FairScheduler to the ReservationSystem

2015-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265318#comment-14265318
 ] 

Hudson commented on YARN-2574:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6812 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6812/])
YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot 
via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java


 Add support for FairScheduler to the ReservationSystem
 --

 Key: YARN-2574
 URL: https://issues.apache.org/jira/browse/YARN-2574
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: fairscheduler
Reporter: Subru Krishnan
Assignee: Anubhav Dhoot

 YARN-1051 introduces the ReservationSystem and the current implementation is 
 based on CapacityScheduler. This JIRA proposes adding support for 
 FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler

2015-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265317#comment-14265317
 ] 

Hudson commented on YARN-2881:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6812 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6812/])
YARN-2881. [YARN-2574] Implement PlanFollower for FairScheduler. (Anubhav Dhoot 
via kasha) (kasha: rev 0c4b11267717eb451fa6ed4c586317f2db32fbd5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ReservationQueueConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemTestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairSchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacitySchedulerPlanFollower.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/FairSchedulerPlanFollower.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerDynamicBehavior.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/PlanQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java


 Implement PlanFollower for FairScheduler
 

 Key: YARN-2881
 URL: https://issues.apache.org/jira/browse/YARN-2881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, 
 YARN-2881.002.patch, YARN-2881.003.patch, YARN-2881.004.patch, 
 YARN-2881.005.patch, YARN-2881.006.patch, YARN-2881.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately

2015-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265154#comment-14265154
 ] 

Hudson commented on YARN-2958:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6808 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6808/])
YARN-2958. Made RMStateStore not update the last sequence number when updating 
the delegation token. Contributed by Varun Saxena. (zjshen: rev 
562a701945be3a672f9cb5a52cc6db2c1589ba2b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreRMDTEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/LeveldbRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMDelegationTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java


 RMStateStore seems to unnecessarily and wrongly store sequence number 
 separately
 

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch, YARN-2958.004.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // 

[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately

2015-01-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265409#comment-14265409
 ] 

Varun Saxena commented on YARN-2958:


Thanks [~jianhe] and [~zjshen] for the review.

 RMStateStore seems to unnecessarily and wrongly store sequence number 
 separately
 

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch, YARN-2958.004.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info

2015-01-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265407#comment-14265407
 ] 

Varun Saxena commented on YARN-2978:


Test failure unrelated. Passing in local

 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch, YARN-2978.004.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh

2015-01-05 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265426#comment-14265426
 ] 

Jeff Zhang commented on YARN-3000:
--

[~aw] THanks for clarification, then it make sense to make YARN_PID_DIR 
deprecated. 

 YARN_PID_DIR should be visible in yarn-env.sh
 -

 Key: YARN-3000
 URL: https://issues.apache.org/jira/browse/YARN-3000
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.6.0
Reporter: Jeff Zhang
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3000.patch


 Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed 
 not the place for user to set up enviroment variable. IMO, yarn-env.sh is the 
 place for users to set up enviroment variable just like hadoop-env.sh, so 
 it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment 
 just like YARN_RESOURCEMANAGER_HEAPSIZE )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-05 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2230:
-
Attachment: YARN-2230.002.patch

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2015-01-05 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.26.patch

reformatted some sections of testleafqueue, commenting the null check for 
rmcontext.getscheduler in ficaschedulerapp to see how widespread that condition 
is in the tests.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-05 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265572#comment-14265572
 ] 

Chengbing Liu commented on YARN-2997:
-

{quote}
I think this is not possible given that we are looping 
this.context.getContainers() which is based on containerId to Container map. Or 
we can just use a list.
{quote}
We are looping over {{context.getContainers()}}, plus possible remainders from 
the previous heartbeat (in case of a lost heartbeat). If the previously 
completed container has its status changed somehow, there would be two 
different ContainerStatus with same ID reported. That's why I use a map, and 
use {{pendingCompletedContainers.put(containerId, containerStatus)}} instead of 
{{containerStatuses.add(containerStatus)}} directly, in order to prevent such 
duplications
{quote}
then we should send the pendingCompletedContainers in getNMContainerStatuses 
method too
{quote}
We may not need to change {{getNMContainerStatuses}}, as it will send all 
container statuses in NM context, except the containers whose application is 
not in NM context. I think that will cover all elements in 
{{pendingCompletedContainers}}. And lost heartbeat is not a problem with 
{{getNMContainerStatuses}}.
{quote}
or we can just put it at the last line of 
removeOrTrackCompletedContainersFromContext so as to avoid the newly added 
method. 
{quote}
That's a good idea. I will change this in the next patch. Thanks for your 
advice!

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3008) FairScheduler: Use lock for queuemanager instead of synchronized on FairScheduler

2015-01-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3008:
--

Assignee: Varun Saxena

 FairScheduler: Use lock for queuemanager instead of synchronized on 
 FairScheduler
 -

 Key: YARN-3008
 URL: https://issues.apache.org/jira/browse/YARN-3008
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Varun Saxena

 Instead of a big monolithic lock on FairScheduler, we can have an explicit 
 lock on queuemanager and revisit all synchronized methods in FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2015-01-05 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265480#comment-14265480
 ] 

Craig Welch commented on YARN-2637:
---


bq. Regarding null checks in FiCaSchedulerApp. Since scheduler assumes 
application is in running state when adding FiCaSchedulerApp. It is a big issue 
if RMApp cannot be found at that time. So comparing to just ignore such error, 
I think you need throw exception (if that exception will not cause RM shutdown) 
and log such error.

I'm not quite sure how to phrase this differently to get the point across - it 
is already the case throughout the many mocking points which interact with this 
code that the rmapp may be null at this point (if it were not the case it would 
not be necessary to check for it).  As I mentioned previously, the 
ResourceManager itself checks for this case.  I am not introducing the mocking 
which resulted in this state, or even existing checks for it in non-test code, 
I'm receiving this state and carrying it forward in the same way as it has been 
done elsewhere (and, again, not simply in tests).  Changing this is not 
something which belongs in the scope of this jira because it represents a 
rationalization/overhaul of mocking throughout this area (resource manager, 
schedulers), it is non-trivial and not specific to or properly within the scope 
of this change.  Feel free to create a separate jira to improve the mocking 
throughout the code.  The separate null-check for the amresourcerequest is 
necessitated by the apparently intentional behavior of unmanaged am's.

bq. And when this is possible?

+  if (rmContext.getScheduler() != null) 

again, in existing test paths, and existing code is tolerant of this as well, 
I'm merely carrying it forward - it would belong in the new jira as well, were 
one opened

bq. \t in leafqueue - I've checked and the spacing is consistent with the 
existing spacing in the file.


 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2015-01-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265411#comment-14265411
 ] 

Wangda Tan commented on YARN-2637:
--

Hi [~cwelch],
Thanks for updating, the latest patch looks much cleaner to me.

Regarding null checks in FiCaSchedulerApp. Since scheduler assumes application 
is in running state when adding FiCaSchedulerApp. It is a big issue if RMApp 
cannot be found at that time. So comparing to just ignore such error, I think 
you need throw exception (if that exception will not cause RM shutdown) and log 
such error.

And when this is possible?
{code}
+  if (rmContext.getScheduler() != null) {
+amResource = rmContext.getScheduler().getMinimumResourceCapability();
+  }
{code}
If this is to address mock issue in tests, I suggest to modify test logic to 
avoid such changes.

TestLeafQueue has some \t. Could you fix that please? 

Wangda

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265516#comment-14265516
 ] 

Hadoop QA commented on YARN-2230:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690202/YARN-2230.002.patch
  against trunk revision 0c4b112.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6247//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6247//console

This message is automatically generated.

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at 

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2015-01-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265530#comment-14265530
 ] 

Wangda Tan commented on YARN-2637:
--

Regarding null check, I agree we can just go ahead and file a separated ticket 
to address them, it is not caused by your patch -- As you said, it is in the 
past the mocked tests forgot to set some fields but no one triggered that. But 
I think it will be helpful to solve at least the getScheduler() problem with 
this patch together. Since additional check will make people in the future 
maintaining the code spending more time think about why such checks exist.

And I've just checked the TestLeafQueue has some existing \t, but not much 
lines (about 20 lines), can you reformat them in your patch? We shouldn't make 
our code consistent with previous bad code style. :)

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2015-01-05 Thread Chris K Wensel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265474#comment-14265474
 ] 

Chris K Wensel commented on YARN-1729:
--

Just want to point out that if a value passed to 
TimelineWebServices#parsePairStr starts with a number, it is parsed as a 
number, causing the filter to fail if the value is stored as a string. 

That is, the rules for parsing the query string (String vs Object) are not 
consistent with the put methods that parse the JSON entity.

ckw



 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265556#comment-14265556
 ] 

Jian He commented on YARN-2230:
---

[~vijaysbhat],  you can assign to yourself once you start working on it. I'm 
adding you to the contributor list.
Also, when you submit the patch, it'll be good to summarize what the patch does 
so that others can understand the patch more easily. Summarizing the patch is 
also useful for history tracking.

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2015-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265625#comment-14265625
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690221/YARN-2637.26.patch
  against trunk revision 0c4b112.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacitySchedulerPlanFollower
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6248//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6248//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6248//console

This message is automatically generated.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2881) Implement PlanFollower for FairScheduler

2015-01-05 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265695#comment-14265695
 ] 

Subru Krishnan commented on YARN-2881:
--

Thanks [~adhoot] for making sure that the reservation system works with FS also 
and [~kasha] for reviewing/committing it. Just for the record I did look at the 
last version of the patch with [~adhoot] today afternoon and looked good to me.

 Implement PlanFollower for FairScheduler
 

 Key: YARN-2881
 URL: https://issues.apache.org/jira/browse/YARN-2881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-2881.001.patch, YARN-2881.002.patch, 
 YARN-2881.002.patch, YARN-2881.003.patch, YARN-2881.004.patch, 
 YARN-2881.005.patch, YARN-2881.006.patch, YARN-2881.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-05 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265663#comment-14265663
 ] 

Chengbing Liu commented on YARN-2997:
-

[~jianhe] Can we perhaps deal with {{getNMContainerStatuses}} issue in another 
JIRA? This one has not changed anything for RM restart yet. If so, the only 
thing left is the {{pendingCompletedContainers.clear()}} thing. What do you 
think?

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-05 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3009:
---

Assignee: Naganarasimha G R

 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R

 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265672#comment-14265672
 ] 

Naganarasimha G R commented on YARN-3009:
-

Hi [~cwensel]
I have assigned this issue to myself, if you want to work or already working on 
this please feel free to reassign.

 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R

 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-05 Thread Vijay Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265677#comment-14265677
 ] 

Vijay Bhat commented on YARN-2230:
--

Thanks Jian! I've submitted the updated patch.

Patch summary:

I've updated the text description for the following properties in 
yarn-default.xml to be reflective of the actual behavior in the code.

* yarn.scheduler.minimum-allocation-vcores
* yarn.scheduler.maximum-allocation-vcores
* yarn.scheduler.minimum-allocation-mb
* yarn.scheduler.maximum-allocation-mb

This is a documentation patch and does not change any code behavior.

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265637#comment-14265637
 ] 

Jian He commented on YARN-2997:
---

bq. except the containers whose application is not in NM context.
I think we should send containers whose application is not in NMContext too for 
recovery.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-05 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat reassigned YARN-2230:


Assignee: Vijay Bhat

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance

2015-01-05 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265707#comment-14265707
 ] 

Yi Liu commented on YARN-2996:
--

Thanks [~zjshen] for review.
You are right, for *.new* and *.tmp* file, the existing code uses them for some 
check. But actually the incompatible issue you mentioned is really rare and 
it's not a big issue. {{checkAndResumeUpdateOperation}} exists because we write 
state to *.tmp* file, then rename it to *.new* file, and finally rename to 
_output\_file_. If we remove step of renaming to *.new* file, we can remove 
this function too.
Anyway, I will revert this modification.

So in the new patch, I only keep the #1 described in description. I add two new 
fixes in the new patch:
*1.* we missed *synchronized* for {{updateRMDelegationTokenState}}
*2.* Add fix of YARN-3004 to this patch, since {{MemoryRMStateStore}} is only 
used in test and we can fix them in this patch too.

 Refine some fs operations in FileSystemRMStateStore to improve performance
 --

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-05 Thread Chris K Wensel (JIRA)
Chris K Wensel created YARN-3009:


 Summary: TimelineWebServices always parses primary and secondary 
filters as numbers if first char is a number
 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Chris K Wensel


If you pass a filter value that starts with a number (7CCA...), the filter 
value will be parsed into the Number '7' causing the filter to fail the search.

Should be noted the actual value as stored via a PUT operation is properly 
parsed and stored as a String.

This manifests as a very hard to identify issue with DAGClient in Apache Tez 
and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-05 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2997:
--
Assignee: Chengbing Liu

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance

2015-01-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-2996:
-
Attachment: YARN-2996.002.patch

 Refine some fs operations in FileSystemRMStateStore to improve performance
 --

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch, YARN-2996.002.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2015-01-05 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264827#comment-14264827
 ] 

Chris Trezzo commented on YARN-2217:


Test failure seems unrelated.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch, YARN-2217-trunk-v5.patch, 
 YARN-2217-trunk-v6.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately

2015-01-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264831#comment-14264831
 ] 

Varun Saxena commented on YARN-2958:


bq. If you take a look at the old addStoreOrUpdateOps. Storing DT and writing 
last sequence number is put in the same opList, hence both are executed or 
neither.
[~zjshen], I had actually made it conditional in the patch i.e. sequence number 
will be put in the opList only if isUpdateSeqNo is enabled.
Anyways, if we go by the assumption that If znode doesn't exist when updating, 
we suspet DT is not written, and neither does the sequence number, we do not 
need isUpdateSeqNo flag. I will make the change and upload a new patch.
Thanks for the review.

 RMStateStore seems to unnecessarily and wrongly store sequence number 
 separately
 

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1177) Support automatic failover using ZKFC

2015-01-05 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-1177:

Assignee: (was: Akira AJISAKA)

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately

2015-01-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264823#comment-14264823
 ] 

Zhijie Shen commented on YARN-2958:
---

bq. And if it doesnt exist we store it as a new token(not update it). In this 
case, I think we should not overwrite the sequence number.

If you take a look at the old addStoreOrUpdateOps. Storing DT and writing last 
sequence number is put in the same opList, hence both are executed or neither. 
If znode doesn't exist when updating, we suspet DT is not written, and neither 
does the sequence number.

 RMStateStore seems to unnecessarily and wrongly store sequence number 
 separately
 

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1177) Support automatic failover using ZKFC

2015-01-05 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264842#comment-14264842
 ] 

Akira AJISAKA commented on YARN-1177:
-

After rethinking this, I don't need ZKFC for RM failover. I really want to do 
is to improve the error message when attempting graceful failover without ZKFC.
Feel free to take it over, thanks.

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Akira AJISAKA
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately

2015-01-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265018#comment-14265018
 ] 

Zhijie Shen commented on YARN-2958:
---

The last patch looks good to me. [~jianhe], do you have any further comments? 
Otherwise, I'll commit the patch late today.

In AbstractDelegationTokenSecretManager, the following code should be no longer 
useful in YARN scope. However, in case other impl of 
AbstractDelegationTokenSecretManager, doesn't store last sequence number 
separately, let's still keep this logic.
{code}
if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
  setDelegationTokenSeqNum(identifier.getSequenceNumber());
}
{code}

 RMStateStore seems to unnecessarily and wrongly store sequence number 
 separately
 

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch, YARN-2958.004.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately

2015-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265043#comment-14265043
 ] 

Hadoop QA commented on YARN-2958:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690110/YARN-2958.004.patch
  against trunk revision dfd2589.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6244//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6244//console

This message is automatically generated.

 RMStateStore seems to unnecessarily and wrongly store sequence number 
 separately
 

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch, YARN-2958.004.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   

[jira] [Assigned] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-01-05 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned YARN-2716:
---

Assignee: (was: Robert Kanter)

Sure, go ahead.

 Refactor ZKRMStateStore retry code with Apache Curator
 --

 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
 simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled

2015-01-05 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created YARN-3006:
---

 Summary: Improve the error message when attempting manual failover 
with auto-failover enabled
 Key: YARN-3006
 URL: https://issues.apache.org/jira/browse/YARN-3006
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor


When executing manual failover with automatic failover enabled, 
UnsupportedOperationException is thrown.
{code}
# yarn rmadmin -failover rm1 rm2
Exception in thread main java.lang.UnsupportedOperationException: 
RMHAServiceTarget doesn't have a corresponding ZKFC address
at 
org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51)
at 
org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94)
at 
org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311)
at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282)
at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449)
at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378)
at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622)
{code}
I'm thinking the above message is confusing to users. (Users may think whether 
ZKFC is configured correctly...) The command should output error message to 
stderr instead of throwing Exception.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1177) Support automatic failover using ZKFC

2015-01-05 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264897#comment-14264897
 ] 

Akira AJISAKA commented on YARN-1177:
-

Filed YARN-3006 to improve the error message.

 Support automatic failover using ZKFC
 -

 Key: YARN-1177
 URL: https://issues.apache.org/jira/browse/YARN-1177
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
 Attachments: yarn-1177-ancient-version.patch


 Prior to embedding leader election and failover controller in the RM 
 (YARN-1029), it might be a good idea to use ZKFC for a first-cut automatic 
 failover implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance

2015-01-05 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264901#comment-14264901
 ] 

Zhijie Shen commented on YARN-2996:
---

bq. we can merge them to save one RPC call

It sounds a good idea

bq. we can reduce one rename operation.

This will affect the mechanism that we use *.new* file to recover the actual 
state file  when recovering RM. It needs to be take care of, too. Perhaps we 
can't simply remove the logic to recover actual state file from *.new* file, 
and I can think of a rare incompatible issue. See the following procedure:

1. Old FS RMStateStore writes state file and fails after .new file is created.
2. RM stops.
3. RM is upgraded and so does FS RMStateStore.
4. RM starts again.
5. New FS RMStateStore will not recover *.new* file, and may mistake it as a 
normal file.

 Refine some fs operations in FileSystemRMStateStore to improve performance
 --

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately

2015-01-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2958:
---
Attachment: YARN-2958.004.patch

 RMStateStore seems to unnecessarily and wrongly store sequence number 
 separately
 

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch, YARN-2958.004.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info

2015-01-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264912#comment-14264912
 ] 

Jian He commented on YARN-2978:
---

looks good, 
I think we may just synchronize AbstractQueue#getQueueInfo because the caller 
is synchronized anyways or use the AbstractQueue#getUsedCapacity method so as 
to avoid adding usedCapacity to the exclusion list ?

 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2978) ResourceManager crashes with NPE while getting queue info

2015-01-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2978:
---
Attachment: YARN-2978.004.patch

 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch, YARN-2978.004.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264975#comment-14264975
 ] 

Jian He commented on YARN-2997:
---

bq. we may run into a situation where we report two different ContainerStatus 
with same ID
I think this is not possible given that we are looping 
{{this.context.getContainers()}} which is based on containerId to Container 
map. Or we can just use a list.
bq. So in my opinion, that could be a potential leak.
I see. then we should send the pendingCompletedContainers in 
getNMContainerStatuses method too.  and {{pendingCompletedContainers.clear()}}; 
 should be put after {{if (response.getNodeAction() == NodeAction.RESYNC) }}, 
or we can just put it at the last line of 
removeOrTrackCompletedContainersFromContext so as to avoid the newly added 
method. 

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh

2015-01-05 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264709#comment-14264709
 ] 

Allen Wittenauer commented on YARN-3000:


It was part of HADOOP-9902, thus why it isn't in branch-2 at all, much less 2.6

 YARN_PID_DIR should be visible in yarn-env.sh
 -

 Key: YARN-3000
 URL: https://issues.apache.org/jira/browse/YARN-3000
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 2.6.0
Reporter: Jeff Zhang
Assignee: Rohith
Priority: Minor
 Attachments: 0001-YARN-3000.patch


 Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed 
 not the place for user to set up enviroment variable. IMO, yarn-env.sh is the 
 place for users to set up enviroment variable just like hadoop-env.sh, so 
 it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment 
 just like YARN_RESOURCEMANAGER_HEAPSIZE )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info

2015-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265074#comment-14265074
 ] 

Hadoop QA commented on YARN-2978:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690124/YARN-2978.004.patch
  against trunk revision dfd2589.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6245//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6245//console

This message is automatically generated.

 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch, YARN-2978.004.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3007) TestNMWebServices#testContainerLogs fails intermittently

2015-01-05 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3007:
-

 Summary: TestNMWebServices#testContainerLogs fails intermittently
 Key: YARN-3007
 URL: https://issues.apache.org/jira/browse/YARN-3007
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Minor


TestNMWebServices#testContainerLogs fails intermittently with JDK 7:

{noformat}
java.lang.AssertionError: Failed to create log dir
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices.testContainerLogs(TestNMWebServices.java:336)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2958) RMStateStore seems to unnecessarily and wrongly store sequence number separately

2015-01-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265118#comment-14265118
 ] 

Jian He commented on YARN-2958:
---

lgtm, thanks [~varun_saxena] and [~zjshen] !

 RMStateStore seems to unnecessarily and wrongly store sequence number 
 separately
 

 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-2958.001.patch, YARN-2958.002.patch, 
 YARN-2958.003.patch, YARN-2958.004.patch


 It seems that RMStateStore updates last sequence number when storing or 
 updating each individual DT, to recover the latest sequence number when RM 
 restarting.
 First, the current logic seems to be problematic:
 {code}
   public synchronized void updateRMDelegationTokenAndSequenceNumber(
   RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
   int latestSequenceNumber) {
 if(isFencedState()) {
   LOG.info(State store is in Fenced state. Can't update RM Delegation 
 Token.);
   return;
 }
 try {
   updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
 renewDate,
   latestSequenceNumber);
 } catch (Exception e) {
   notifyStoreOperationFailed(e);
 }
   }
 {code}
 {code}
   @Override
   protected void updateStoredToken(RMDelegationTokenIdentifier id,
   long renewDate) {
 try {
   LOG.info(updating RMDelegation token with sequence number: 
   + id.getSequenceNumber());
   rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
 renewDate, id.getSequenceNumber());
 } catch (Exception e) {
   LOG.error(Error in updating persisted RMDelegationToken with sequence 
 number: 
 + id.getSequenceNumber());
   ExitUtil.terminate(1, e);
 }
   }
 {code}
 According to code above, even when renewing a DT, the last sequence number is 
 updated in the store, which is wrong. For example, we have the following 
 sequence:
 1. Get DT 1 (seq = 1)
 2. Get DT 2( seq = 2)
 3. Renew DT 1 (seq = 1)
 4. Restart RM
 The stored and then recovered last sequence number is 1. It makes the next 
 created DT after RM restarting will conflict with DT 2 on sequence num.
 Second, the aforementioned bug doesn't happen actually, because the recovered 
 last sequence num has been overwritten at by the correctly one.
 {code}
   public void recover(RMState rmState) throws Exception {
 LOG.info(recovering RMDelegationTokenSecretManager.);
 // recover RMDTMasterKeys
 for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
   .getMasterKeyState()) {
   addKey(dtKey);
 }
 // recover RMDelegationTokens
 MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
 rmState.getRMDTSecretManagerState().getTokenState();
 this.delegationTokenSequenceNumber =
 rmState.getRMDTSecretManagerState().getDTSequenceNumber();
 for (Map.EntryRMDelegationTokenIdentifier, Long entry : 
 rmDelegationTokens
   .entrySet()) {
   addPersistedDelegationToken(entry.getKey(), entry.getValue());
 }
   }
 {code}
 The code above recovers delegationTokenSequenceNumber by reading the last 
 sequence number in the store. It could be wrong. Fortunately, 
 delegationTokenSequenceNumber updates it to the right number.
 {code}
 if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
   setDelegationTokenSeqNum(identifier.getSequenceNumber());
 }
 {code}
 All the stored identifiers will be gone through, and 
 delegationTokenSequenceNumber will be set to the largest sequence number 
 among these identifiers. Therefore, new DT will be assigned a sequence number 
 which is always larger than that of all the recovered DT.
 To sum up, two negatives make a positive, but it's good to fix the issue. 
 Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3007) TestNMWebServices#testContainerLogs fails intermittently

2015-01-05 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-3007.
---
Resolution: Invalid

This issue is not reproducible in 2.7.0 or in trunk. Closing.

 TestNMWebServices#testContainerLogs fails intermittently
 

 Key: YARN-3007
 URL: https://issues.apache.org/jira/browse/YARN-3007
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Minor

 TestNMWebServices#testContainerLogs fails intermittently with JDK 7:
 {noformat}
 java.lang.AssertionError: Failed to create log dir
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices.testContainerLogs(TestNMWebServices.java:336)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)