[jira] [Commented] (YARN-2997) NM keeps sending already-sent completed containers to RM until containers are removed from context

2015-01-08 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270347#comment-14270347
 ] 

Chengbing Liu commented on YARN-2997:
-

Thanks [~jianhe] !

 NM keeps sending already-sent completed containers to RM until containers are 
 removed from context
 --

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Fix For: 2.7.0

 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.5.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3016) (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager

2015-01-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270518#comment-14270518
 ] 

Rohith commented on YARN-3016:
--

It make sense to me.

 (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in 
 CommonNodeLabelsManager
 -

 Key: YARN-3016
 URL: https://issues.apache.org/jira/browse/YARN-3016
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan

 Now we have separated but similar implementations for add/remove/replace 
 labels on node in CommonNodeLabelsManager, we should merge it to a single one 
 for easier modify them and better readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-08 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270522#comment-14270522
 ] 

Akira AJISAKA commented on YARN-2807:
-

+1, thank you [~iwasakims].

 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2956) Some yarn-site index linked pages are difficult to discover because are not in the side bar

2015-01-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270255#comment-14270255
 ] 

Jian He commented on YARN-2956:
---

[~iwasakims], thanks for working on this. maybe add a link in the side bar  to 
link to the  hadoop-yarn-site/index.html too ? The link can be located at the 
YARN section on the side bar and may call it overview. 
I think it's fine to keep a full list of document indexes in the main index too 
?

 Some yarn-site index linked pages are difficult to discover because are not 
 in the side bar
 ---

 Key: YARN-2956
 URL: https://issues.apache.org/jira/browse/YARN-2956
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Remus Rusanu
Assignee: Masatake Iwasaki
Priority: Minor
  Labels: documentation
 Attachments: YARN-2956.1.patch


 The yarn-site index.apt.vm page is difficult to 'stumble upon' because the 
 hadoop.apache.org/ sidebar navigation does not link to it. One needs to know 
 the URL http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/ to 
 land on it.
 The links from the index page do not match the links from the side bar, so 
 some pages are quickly accessible (from sidebar).
 I propose the links from the index.apt.vm to match the links from the YARN 
 side bar subsection (Ideally through one single definition file, but I don't 
 understand the APT generation process well enough to call out how this can be 
 achieved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes

2015-01-08 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270274#comment-14270274
 ] 

Yi Liu commented on YARN-2996:
--

Thanks [~zjshen] for review and commit.

 Refine fs operations in FileSystemRMStateStore and few fixes
 

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 2.7.0

 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, 
 YARN-2996.003.patch, YARN-2996.004.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3022) Expose Container resource information from NodeManager for monitoring

2015-01-08 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-3022:
---

 Summary: Expose Container resource information from NodeManager 
for monitoring
 Key: YARN-3022
 URL: https://issues.apache.org/jira/browse/YARN-3022
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


Along with exposing resource consumption of each container such as (YARN-2141) 
its worth exposing the actual resource limit associated with them to get better 
insight into YARN allocation and consumption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-810) Support CGroup ceiling enforcement on CPU

2015-01-08 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-810:
-
Attachment: YARN-810-6.patch

Update a patch to fix the test failures.

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810-3.patch, YARN-810-4.patch, YARN-810-5.patch, 
 YARN-810-6.patch, YARN-810.patch, YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_period_us
 10
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 /cgroup/cpu/hadoop-yarn/cpu.cfs_quota_us
 -1
 {noformat}
 Oddly, it appears that the cfs_period_us is set to .1s, not 1s.
 We can place processes in hard limits. I have process 4370 running YARN 
 container container_1371141151815_0003_01_03 on a host. By default, it's 
 running at ~300% cpu usage.
 {noformat}
 CPU
 4370 criccomi  20   0 1157m 551m  14m S 240.3  0.8  87:10.91 ...
 {noformat}
 When I set the CFS quote:
 {noformat}
 echo 1000  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
  CPU
 4370 criccomi  20   0 1157m 563m  14m S  1.0  0.8  90:08.39 ...
 {noformat}
 It drops to 1% usage, and you can see the box has room to spare:
 {noformat}
 Cpu(s):  2.4%us,  1.0%sy,  0.0%ni, 92.2%id,  4.2%wa,  0.0%hi,  0.1%si, 
 0.0%st
 {noformat}
 Turning the quota back to -1:
 {noformat}
 echo -1  
 /cgroup/cpu/hadoop-yarn/container_1371141151815_0003_01_03/cpu.cfs_quota_us
 {noformat}
 Burns the cores again:
 {noformat}
 Cpu(s): 11.1%us,  1.7%sy,  0.0%ni, 83.9%id,  3.1%wa,  0.0%hi,  0.2%si, 
 0.0%st
 CPU
 4370 criccomi  20   0 1157m 563m  14m S 253.9  0.8  89:32.31 ...
 {noformat}
 On my dev box, I was testing CGroups by running a python process eight times, 
 to burn through all the cores, since it was doing as described above (giving 
 extra CPU to the process, even with a cpu.shares limit). Toggling the 
 cfs_quota_us seems to enforce a hard limit.
 

[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-01-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-313:

Attachment: YARN-313-v3.patch

Resync the patch to latest trunk.

 Add Admin API for supporting node resource configuration in command line
 

 Key: YARN-313
 URL: https://issues.apache.org/jira/browse/YARN-313
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
 YARN-313-v2.patch, YARN-313-v3.patch


 We should provide some admin interface, e.g. yarn rmadmin -refreshResources 
 to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3014) Replaces labels on a host should update all NM's labels on that host

2015-01-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3014:
-
Attachment: YARN-3014-2.patch

Thanks [~jianhe]'s review, addressed comments and also updated test cases to 
cover them.

 Replaces labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3014-1.patch, YARN-3014-2.patch


 Admin can either specify labels on a host (by running {{yarn rmadmin 
 -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
 rmadmin -replaceLabelsOnNode host1:port,label1}}).
 If user has specified label=x on a NM (instead of host), and later set the 
 label=y on host of the NM. NM's label should update to y as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3023) Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM crash

2015-01-08 Thread zhihai xu (JIRA)
zhihai xu created YARN-3023:
---

 Summary: Race condition in ZKRMStateStore#createWithRetries from 
ZooKeeper cause RM crash 
 Key: YARN-3023
 URL: https://issues.apache.org/jira/browse/YARN-3023
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu


Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM 
crash.

The sequence for the Race condition is the following:
1, RM Store attempt state to ZK by calling createWithRetries
{code}
2015-01-06 12:37:35,343 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Storing attempt: AppId: application_1418914202950_42363 AttemptId: 
appattempt_1418914202950_42363_01 MasterContainer: Container: [ContainerId: 
container_1418914202950_42363_01_01,
{code}

2. unluckily ConnectionLoss for the ZK session happened at the same time as RM 
Stored attempt state to ZK.
The ZooKeeper server created the node and store the data successfully, But due 
to ConnectionLoss, RM didn't know the operation (createWithRetries) is 
succeeded.
{code}
2015-01-06 12:37:36,102 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss
{code}

3.RM did retry to store attempt state to ZK after one second
{code}
2015-01-06 12:37:36,104 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying 
operation on ZK. Retry no. 1
{code}

4. during the one second interval, the ZK session is reconnected.
{code}
2015-01-06 12:37:36,210 INFO org.apache.zookeeper.ClientCnxn: Socket connection 
established initiating session
2015-01-06 12:37:36,213 INFO org.apache.zookeeper.ClientCnxn: Session 
establishment complete on server, sessionid = 0x44a9166eb2d12cb, negotiated 
timeout = 1
{code}

5. Because the node was created successfully at ZooKeeper in the first 
try(runWithCheck),
For the second try, it will fail with NodeExists KeeperException
{code}
2015-01-06 12:37:37,116 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists
2015-01-06 12:37:37,118 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
out ZK retries. Giving up!
{code}

6.This NodeExists KeeperException will cause Storing AppAttempt failure in 
RMStateStore
{code}
2015-01-06 12:37:37,118 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
storing appAttempt: appattempt_1418914202950_42363_01
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists
{code}

7.RMStateStore will send RMFatalEventType.STATE_STORE_OP_FAILED event to 
ResourceManager
{code}
  protected void notifyStoreOperationFailed(Exception failureCause) {
RMFatalEventType type;
if (failureCause instanceof StoreFencedException) {
  type = RMFatalEventType.STATE_STORE_FENCED;
} else {
  type = RMFatalEventType.STATE_STORE_OP_FAILED;
}
rmDispatcher.getEventHandler().handle(new RMFatalEvent(type, failureCause));
  }
{code}

8.ResoureManager will kill itself after received STATE_STORE_OP_FAILED 
RMFatalEvent.
{code}
2015-01-06 12:37:37,128 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists
2015-01-06 12:37:37,138 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3014) Replaces labels on a host should update all NM's labels on that host

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270475#comment-14270475
 ] 

Hadoop QA commented on YARN-3014:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691003/YARN-3014-2.patch
  against trunk revision ae91b13.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6287//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6287//console

This message is automatically generated.

 Replaces labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3014-1.patch, YARN-3014-2.patch


 Admin can either specify labels on a host (by running {{yarn rmadmin 
 -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
 rmadmin -replaceLabelsOnNode host1:port,label1}}).
 If user has specified label=x on a NM (instead of host), and later set the 
 label=y on host of the NM. NM's label should update to y as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-810) Support CGroup ceiling enforcement on CPU

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270492#comment-14270492
 ] 

Hadoop QA commented on YARN-810:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691002/YARN-810-6.patch
  against trunk revision ae91b13.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6286//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6286//console

This message is automatically generated.

 Support CGroup ceiling enforcement on CPU
 -

 Key: YARN-810
 URL: https://issues.apache.org/jira/browse/YARN-810
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.1.0-beta, 2.0.5-alpha
Reporter: Chris Riccomini
Assignee: Sandy Ryza
 Attachments: YARN-810-3.patch, YARN-810-4.patch, YARN-810-5.patch, 
 YARN-810-6.patch, YARN-810.patch, YARN-810.patch


 Problem statement:
 YARN currently lets you define an NM's pcore count, and a pcore:vcore ratio. 
 Containers are then allowed to request vcores between the minimum and maximum 
 defined in the yarn-site.xml.
 In the case where a single-threaded container requests 1 vcore, with a 
 pcore:vcore ratio of 1:4, the container is still allowed to use up to 100% of 
 the core it's using, provided that no other container is also using it. This 
 happens, even though the only guarantee that YARN/CGroups is making is that 
 the container will get at least 1/4th of the core.
 If a second container then comes along, the second container can take 
 resources from the first, provided that the first container is still getting 
 at least its fair share (1/4th).
 There are certain cases where this is desirable. There are also certain cases 
 where it might be desirable to have a hard limit on CPU usage, and not allow 
 the process to go above the specified resource requirement, even if it's 
 available.
 Here's an RFC that describes the problem in more detail:
 http://lwn.net/Articles/336127/
 Solution:
 As it happens, when CFS is used in combination with CGroups, you can enforce 
 a ceiling using two files in cgroups:
 {noformat}
 cpu.cfs_quota_us
 cpu.cfs_period_us
 {noformat}
 The usage of these two files is documented in more detail here:
 https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
 Testing:
 I have tested YARN CGroups using the 2.0.5-alpha implementation. By default, 
 it behaves as described above (it is a soft cap, and allows containers to use 
 more than they asked for). I then tested CFS CPU quotas manually with YARN.
 First, you can see that CFS is in use in the CGroup, based on the file names:
 {noformat}
 [criccomi@eat1-qa464 ~]$ sudo -u app ls -l /cgroup/cpu/hadoop-yarn/
 total 0
 -r--r--r-- 1 app app 0 Jun 13 16:46 cgroup.procs
 drwxr-xr-x 2 app app 0 Jun 13 17:08 container_1371141151815_0004_01_02
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.cfs_quota_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_period_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.rt_runtime_us
 -rw-r--r-- 1 app app 0 Jun 13 16:46 cpu.shares
 -r--r--r-- 1 app app 0 Jun 13 16:46 cpu.stat
 -rw-r--r-- 1 app app 0 Jun 13 16:46 notify_on_release
 -rw-r--r-- 1 app app 0 Jun 13 16:46 tasks
 [criccomi@eat1-qa464 ~]$ sudo -u app cat
 

[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270510#comment-14270510
 ] 

Hadoop QA commented on YARN-313:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691018/YARN-313-v3.patch
  against trunk revision ae91b13.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6288//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6288//console

This message is automatically generated.

 Add Admin API for supporting node resource configuration in command line
 

 Key: YARN-313
 URL: https://issues.apache.org/jira/browse/YARN-313
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: YARN-313-sample.patch, YARN-313-v1.patch, 
 YARN-313-v2.patch, YARN-313-v3.patch


 We should provide some admin interface, e.g. yarn rmadmin -refreshResources 
 to support changes of node's resource specified in a config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3022) Expose Container resource information from NodeManager for monitoring

2015-01-08 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3022:

Attachment: YARN-3022.001.patch

Initial patch based on YARN-2984 which adds metrics for Containers

 Expose Container resource information from NodeManager for monitoring
 -

 Key: YARN-3022
 URL: https://issues.apache.org/jira/browse/YARN-3022
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3022.001.patch


 Along with exposing resource consumption of each container such as 
 (YARN-2141) its worth exposing the actual resource limit associated with them 
 to get better insight into YARN allocation and consumption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270502#comment-14270502
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691012/YARN-2637.36.patch
  against trunk revision ae91b13.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6289//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6289//console

This message is automatically generated.

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, 
 YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3023) Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM crash

2015-01-08 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270526#comment-14270526
 ] 

Rohith commented on YARN-3023:
--

Which version of Hadoop are you using? In trunk this is handled, If node 
already exists then ZKRMStateStore wont throw NodeExists
{code}
catch (KeeperException ke) {
  if (ke.code() == Code.NODEEXISTS) {
LOG.info(znode already exists!);
return null;
  }
// other code
}
{code}

 Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM 
 crash 
 -

 Key: YARN-3023
 URL: https://issues.apache.org/jira/browse/YARN-3023
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu

 Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM 
 crash.
 The sequence for the Race condition is the following:
 1, RM Store attempt state to ZK by calling createWithRetries
 {code}
 2015-01-06 12:37:35,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Storing attempt: AppId: application_1418914202950_42363 AttemptId: 
 appattempt_1418914202950_42363_01 MasterContainer: Container: 
 [ContainerId: container_1418914202950_42363_01_01,
 {code}
 2. unluckily ConnectionLoss for the ZK session happened at the same time as 
 RM Stored attempt state to ZK.
 The ZooKeeper server created the node and store the data successfully, But 
 due to ConnectionLoss, RM didn't know the operation (createWithRetries) is 
 succeeded.
 {code}
 2015-01-06 12:37:36,102 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss
 {code}
 3.RM did retry to store attempt state to ZK after one second
 {code}
 2015-01-06 12:37:36,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Retrying operation on ZK. Retry no. 1
 {code}
 4. during the one second interval, the ZK session is reconnected.
 {code}
 2015-01-06 12:37:36,210 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established initiating session
 2015-01-06 12:37:36,213 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server, sessionid = 0x44a9166eb2d12cb, negotiated 
 timeout = 1
 {code}
 5. Because the node was created successfully at ZooKeeper in the first 
 try(runWithCheck),
 For the second try, it will fail with NodeExists KeeperException
 {code}
 2015-01-06 12:37:37,116 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 2015-01-06 12:37:37,118 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 {code}
 6.This NodeExists KeeperException will cause Storing AppAttempt failure in 
 RMStateStore
 {code}
 2015-01-06 12:37:37,118 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 storing appAttempt: appattempt_1418914202950_42363_01
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 {code}
 7.RMStateStore will send RMFatalEventType.STATE_STORE_OP_FAILED event to 
 ResourceManager
 {code}
   protected void notifyStoreOperationFailed(Exception failureCause) {
 RMFatalEventType type;
 if (failureCause instanceof StoreFencedException) {
   type = RMFatalEventType.STATE_STORE_FENCED;
 } else {
   type = RMFatalEventType.STATE_STORE_OP_FAILED;
 }
 rmDispatcher.getEventHandler().handle(new RMFatalEvent(type, 
 failureCause));
   }
 {code}
 8.ResoureManager will kill itself after received STATE_STORE_OP_FAILED 
 RMFatalEvent.
 {code}
 2015-01-06 12:37:37,128 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 2015-01-06 12:37:37,138 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-08 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2807:

Hadoop Flags: Reviewed

 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3023) Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM crash

2015-01-08 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270581#comment-14270581
 ] 

zhihai xu commented on YARN-3023:
-

Yes, you are right. The issue is the same as YARN-2721.

 Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM 
 crash 
 -

 Key: YARN-3023
 URL: https://issues.apache.org/jira/browse/YARN-3023
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu

 Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM 
 crash.
 The sequence for the Race condition is the following:
 1, RM Store attempt state to ZK by calling createWithRetries
 {code}
 2015-01-06 12:37:35,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Storing attempt: AppId: application_1418914202950_42363 AttemptId: 
 appattempt_1418914202950_42363_01 MasterContainer: Container: 
 [ContainerId: container_1418914202950_42363_01_01,
 {code}
 2. unluckily ConnectionLoss for the ZK session happened at the same time as 
 RM Stored attempt state to ZK.
 The ZooKeeper server created the node and store the data successfully, But 
 due to ConnectionLoss, RM didn't know the operation (createWithRetries) is 
 succeeded.
 {code}
 2015-01-06 12:37:36,102 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss
 {code}
 3.RM did retry to store attempt state to ZK after one second
 {code}
 2015-01-06 12:37:36,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Retrying operation on ZK. Retry no. 1
 {code}
 4. during the one second interval, the ZK session is reconnected.
 {code}
 2015-01-06 12:37:36,210 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established initiating session
 2015-01-06 12:37:36,213 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server, sessionid = 0x44a9166eb2d12cb, negotiated 
 timeout = 1
 {code}
 5. Because the node was created successfully at ZooKeeper in the first 
 try(runWithCheck),
 For the second try, it will fail with NodeExists KeeperException
 {code}
 2015-01-06 12:37:37,116 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 2015-01-06 12:37:37,118 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 {code}
 6.This NodeExists KeeperException will cause Storing AppAttempt failure in 
 RMStateStore
 {code}
 2015-01-06 12:37:37,118 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 storing appAttempt: appattempt_1418914202950_42363_01
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 {code}
 7.RMStateStore will send RMFatalEventType.STATE_STORE_OP_FAILED event to 
 ResourceManager
 {code}
   protected void notifyStoreOperationFailed(Exception failureCause) {
 RMFatalEventType type;
 if (failureCause instanceof StoreFencedException) {
   type = RMFatalEventType.STATE_STORE_FENCED;
 } else {
   type = RMFatalEventType.STATE_STORE_OP_FAILED;
 }
 rmDispatcher.getEventHandler().handle(new RMFatalEvent(type, 
 failureCause));
   }
 {code}
 8.ResoureManager will kill itself after received STATE_STORE_OP_FAILED 
 RMFatalEvent.
 {code}
 2015-01-06 12:37:37,128 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 2015-01-06 12:37:37,138 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3023) Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM crash

2015-01-08 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu resolved YARN-3023.
-
Resolution: Duplicate

 Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM 
 crash 
 -

 Key: YARN-3023
 URL: https://issues.apache.org/jira/browse/YARN-3023
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu

 Race condition in ZKRMStateStore#createWithRetries from ZooKeeper cause RM 
 crash.
 The sequence for the Race condition is the following:
 1, RM Store attempt state to ZK by calling createWithRetries
 {code}
 2015-01-06 12:37:35,343 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Storing attempt: AppId: application_1418914202950_42363 AttemptId: 
 appattempt_1418914202950_42363_01 MasterContainer: Container: 
 [ContainerId: container_1418914202950_42363_01_01,
 {code}
 2. unluckily ConnectionLoss for the ZK session happened at the same time as 
 RM Stored attempt state to ZK.
 The ZooKeeper server created the node and store the data successfully, But 
 due to ConnectionLoss, RM didn't know the operation (createWithRetries) is 
 succeeded.
 {code}
 2015-01-06 12:37:36,102 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss
 {code}
 3.RM did retry to store attempt state to ZK after one second
 {code}
 2015-01-06 12:37:36,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Retrying operation on ZK. Retry no. 1
 {code}
 4. during the one second interval, the ZK session is reconnected.
 {code}
 2015-01-06 12:37:36,210 INFO org.apache.zookeeper.ClientCnxn: Socket 
 connection established initiating session
 2015-01-06 12:37:36,213 INFO org.apache.zookeeper.ClientCnxn: Session 
 establishment complete on server, sessionid = 0x44a9166eb2d12cb, negotiated 
 timeout = 1
 {code}
 5. Because the node was created successfully at ZooKeeper in the first 
 try(runWithCheck),
 For the second try, it will fail with NodeExists KeeperException
 {code}
 2015-01-06 12:37:37,116 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 Exception while executing a ZK operation.
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 2015-01-06 12:37:37,118 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
 out ZK retries. Giving up!
 {code}
 6.This NodeExists KeeperException will cause Storing AppAttempt failure in 
 RMStateStore
 {code}
 2015-01-06 12:37:37,118 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
 storing appAttempt: appattempt_1418914202950_42363_01
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 {code}
 7.RMStateStore will send RMFatalEventType.STATE_STORE_OP_FAILED event to 
 ResourceManager
 {code}
   protected void notifyStoreOperationFailed(Exception failureCause) {
 RMFatalEventType type;
 if (failureCause instanceof StoreFencedException) {
   type = RMFatalEventType.STATE_STORE_FENCED;
 } else {
   type = RMFatalEventType.STATE_STORE_OP_FAILED;
 }
 rmDispatcher.getEventHandler().handle(new RMFatalEvent(type, 
 failureCause));
   }
 {code}
 8.ResoureManager will kill itself after received STATE_STORE_OP_FAILED 
 RMFatalEvent.
 {code}
 2015-01-06 12:37:37,128 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists
 2015-01-06 12:37:37,138 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
 status 1
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3024) LocalizerRunner should give DIE action when all resources are localized

2015-01-08 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-3024:

Attachment: YARN-3024.02.patch

Fixed tests accordingly.

 LocalizerRunner should give DIE action when all resources are localized
 ---

 Key: YARN-3024
 URL: https://issues.apache.org/jira/browse/YARN-3024
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3024.01.patch, YARN-3024.02.patch


 We have observed that {{LocalizerRunner}} always gives a LIVE action at the 
 end of localization process.
 The problem is {{findNextResource()}} can return null even when {{pending}} 
 was not empty prior to the call. This method removes localized resources from 
 {{pending}}, therefore we should check the return value, and gives DIE action 
 when it returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2141) [Umbrella] Capture container and node resource consumption

2015-01-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270535#comment-14270535
 ] 

Vinod Kumar Vavilapalli commented on YARN-2141:
---

One other related effort is YARN-2928 which is also planning to obtain and send 
information about container resource-usage to a per-application aggregator. We 
should try to unify these..

 [Umbrella] Capture container and node resource consumption
 --

 Key: YARN-2141
 URL: https://issues.apache.org/jira/browse/YARN-2141
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Carlo Curino
Priority: Minor

 Collecting per-container and per-node resource consumption statistics in a 
 fairly granular manner, and making them available to both infrastructure code 
 (e.g., schedulers) and users (e.g., AMs or directly users via webapps), can 
 facilitate several performance work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage

2015-01-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270536#comment-14270536
 ] 

Vinod Kumar Vavilapalli commented on YARN-2984:
---

Linking related efforts.

One other related effort is YARN-2928 which is also planning to obtain and send 
information about container resource-usage to a per-application aggregator. We 
should try to unify these..

 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-08 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.36.patch

Should be down to one failing test, let's see

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.36.patch, 
 YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3019) Enable RM work-preserving restart by default

2015-01-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270087#comment-14270087
 ] 

Allen Wittenauer commented on YARN-3019:


I'm not in favor of this going into branch-2.  It's a fundamental change to 
operating expectations that may have a significant impact on capacity.

 Enable RM work-preserving restart by default 
 -

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3019) Enable RM work-preserving restart by default

2015-01-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270100#comment-14270100
 ] 

Jian He commented on YARN-3019:
---

To clarify: this jira is to flip recovery mode to work-preserving recovery from 
non-work-preserving recovery. The feature itself remains disabled.  i.e. 
yarn.resourcemanager.recovery.enabled remains false.  updating the 
description. 

Further, I'm also thinking to enable the feature itself by default and use the 
local FS as the default file system. I'm OK to do this only on trunk. That will 
uncover bugs if any. I can open a separate jira for this.

 Enable RM work-preserving restart by default 
 -

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2015-01-08 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270110#comment-14270110
 ] 

Chen He commented on YARN-1680:
---

Thank you for the comments, [~jlowe]. [~cwelch] created YARN-2848 that discuss 
about blacklist node and label scheduling. I will work on a patch that fixes 
the blacklisted node case. 

 availableResources sent to applicationMaster in heartbeat should exclude 
 blacklistedNodes free memory.
 --

 Key: YARN-1680
 URL: https://issues.apache.org/jira/browse/YARN-1680
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0, 2.3.0
 Environment: SuSE 11 SP2 + Hadoop-2.3 
Reporter: Rohith
Assignee: Chen He
 Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
 YARN-1680-v2.patch, YARN-1680.patch


 There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
 slow start is set to 1.
 Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
 become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
 NodeManager(NM-4). All reducer task are running in cluster now.
 MRAppMaster does not preempt the reducers because for Reducer preemption 
 calculation, headRoom is considering blacklisted nodes memory. This makes 
 jobs to hang forever(ResourceManager does not assing any new containers on 
 blacklisted nodes but returns availableResouce considers cluster free 
 memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-08 Thread Harsh J (JIRA)
Harsh J created YARN-3021:
-

 Summary: YARN's delegation-token handling disallows certain trust 
setups to operate properly
 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J


Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and 
B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
clusters.

Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as 
it attempts a renewDelegationToken(…) synchronously during application 
submission (to validate the managed token before it adds it to a scheduler for 
automatic renewal). The call obviously fails cause B realm will not trust A's 
credentials (here, the RM's principal is the renewer).

In the 1.x JobTracker the same call is present, but it is done asynchronously 
and once the renewal attempt failed we simply ceased to schedule any further 
attempts of renewals, rather than fail the job immediately.

We should change the logic such that we attempt the renewal but go easy on the 
failure and skip the scheduling alone, rather than bubble back an error to the 
client, failing the app submission. This way the old behaviour is retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270116#comment-14270116
 ] 

Jian He commented on YARN-2637:
---

Quick thing, YARN-3010 fixed the find bug warning. the findbugs exclusion in 
the patch maybe not needed.

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3019) Enable RM work-preserving restart by default

2015-01-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3019:
--
Description: The proposal is to set 
yarn.resourcemanager.work-preserving-recovery.enabled to true by default   to 
flip recovery mode to work-preserving recovery from non-work-preserving 
recovery.   (was: The proposal is to set 
yarn.resourcemanager.work-preserving-recovery.enabled to true by default. )

 Enable RM work-preserving restart by default 
 -

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default   
 to flip recovery mode to work-preserving recovery from non-work-preserving 
 recovery. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3014) Replaces labels on a host should update all NM's labels on that host

2015-01-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270179#comment-14270179
 ] 

Jian He commented on YARN-3014:
---

 addLabel and removeLabel on the Host should not do replaceLabel on the NM ?


 Replaces labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3014-1.patch


 Admin can either specify labels on a host (by running {{yarn rmadmin 
 -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
 rmadmin -replaceLabelsOnNode host1:port,label1}}).
 If user has specified label=x on a NM (instead of host), and later set the 
 label=y on host of the NM. NM's label should update to y as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269217#comment-14269217
 ] 

Hudson commented on YARN-3010:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #67 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/67/])
YARN-3010. Fixed findbugs warning in AbstractYarnScheduler. Contributed by Yi 
Liu (jianhe: rev e13a484a2be64fb781c5eca5ae7056cbe194ac5e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269223#comment-14269223
 ] 

Hudson commented on YARN-2230:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #67 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/67/])
YARN-2230. Fixed few configs description in yarn-default.xml. Contributed by 
Vijay Bhat (jianhe: rev fe8d2bd74175e7ad521bc310c41a367c0946d6ec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269221#comment-14269221
 ] 

Hudson commented on YARN-2880:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #67 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/67/])
Moved YARN-2880 to improvement section in CHANGES.txt (jianhe: rev 
ef237bd52fc570292a7e608b373b51dd6d1590b8)
* hadoop-yarn-project/CHANGES.txt


 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269220#comment-14269220
 ] 

Hudson commented on YARN-2936:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #67 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/67/])
YARN-2936. Changed YARNDelegationTokenIdentifier to set proto fields on 
getProto method. Contributed by Varun Saxena (jianhe: rev 
2638f4d0f0da375b0dd08f3188057637ed0f32d5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java


 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, 
 YARN-2936.006.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269248#comment-14269248
 ] 

Hudson commented on YARN-2880:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #801 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/801/])
Moved YARN-2880 to improvement section in CHANGES.txt (jianhe: rev 
ef237bd52fc570292a7e608b373b51dd6d1590b8)
* hadoop-yarn-project/CHANGES.txt


 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269250#comment-14269250
 ] 

Hudson commented on YARN-2230:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #801 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/801/])
YARN-2230. Fixed few configs description in yarn-default.xml. Contributed by 
Vijay Bhat (jianhe: rev fe8d2bd74175e7ad521bc310c41a367c0946d6ec)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269247#comment-14269247
 ] 

Hudson commented on YARN-2936:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #801 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/801/])
YARN-2936. Changed YARNDelegationTokenIdentifier to set proto fields on 
getProto method. Contributed by Varun Saxena (jianhe: rev 
2638f4d0f0da375b0dd08f3188057637ed0f32d5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* hadoop-yarn-project/CHANGES.txt


 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, 
 YARN-2936.006.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269244#comment-14269244
 ] 

Hudson commented on YARN-3010:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #801 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/801/])
YARN-3010. Fixed findbugs warning in AbstractYarnScheduler. Contributed by Yi 
Liu (jianhe: rev e13a484a2be64fb781c5eca5ae7056cbe194ac5e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-01-08 Thread nijel (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nijel reassigned YARN-3018:
---

Assignee: nijel

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Assignee: nijel
Priority: Trivial

 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-01-08 Thread nijel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269058#comment-14269058
 ] 

nijel commented on YARN-3018:
-

Please give your opinion.
I prefer to have the value as -1 in file also

If it sounds good, i can upload a patch

 Unify the default value for yarn.scheduler.capacity.node-locality-delay in 
 code and default xml file
 

 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Priority: Trivial

 For the configuration item yarn.scheduler.capacity.node-locality-delay the 
 default value given in code is -1
 public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;
 In the default capacity-scheduler.xml file in the resource manager config 
 directory it is 40.
 Can it be unified to avoid confusion when the user creates the file without 
 this configuration. IF he expects the values in the file to be default 
 values, then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3018) Unify the default value for yarn.scheduler.capacity.node-locality-delay in code and default xml file

2015-01-08 Thread nijel (JIRA)
nijel created YARN-3018:
---

 Summary: Unify the default value for 
yarn.scheduler.capacity.node-locality-delay in code and default xml file
 Key: YARN-3018
 URL: https://issues.apache.org/jira/browse/YARN-3018
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: nijel
Priority: Trivial


For the configuration item yarn.scheduler.capacity.node-locality-delay the 
default value given in code is -1

public static final int DEFAULT_NODE_LOCALITY_DELAY = -1;

In the default capacity-scheduler.xml file in the resource manager config 
directory it is 40.

Can it be unified to avoid confusion when the user creates the file without 
this configuration. IF he expects the values in the file to be default values, 
then it will be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes

2015-01-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269728#comment-14269728
 ] 

Zhijie Shen commented on YARN-2996:
---

+1. The test failure seems not to be related. Will commit the patch.

 Refine fs operations in FileSystemRMStateStore and few fixes
 

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, 
 YARN-2996.003.patch, YARN-2996.004.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2015-01-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269720#comment-14269720
 ] 

Zhijie Shen commented on YARN-2880:
---

Is 
https://builds.apache.org/job/PreCommit-YARN-Build/6274/testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testShouldNotCountFailureToMaxAttemptRetry/
 related to the change here?

 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269754#comment-14269754
 ] 

Hudson commented on YARN-2996:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6830 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6830/])
YARN-2996. Improved synchronization and I/O operations of FS- and Mem- 
RMStateStore. Contributed by Yi Liu. (zjshen: rev 
dc2eaa26b20cfbbcdd5784bb8761d08a42f29605)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java


 Refine fs operations in FileSystemRMStateStore and few fixes
 

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 2.7.0

 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, 
 YARN-2996.003.patch, YARN-2996.004.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-08 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269784#comment-14269784
 ] 

Mayank Bansal commented on YARN-2933:
-

Thanks [~wangda] and Sunil for review.

bq. In addition to previously comment, I think we put incorrect #container for 
each application when setLabelContainer=true. The usedResource or current 
in TestProportionalPreemptionPolicy actually means used resource of nodes 
without label. So if we want to have labeled container in an application, we 
should make it stay outside of usedResource.

I don't think thats needed as the basic functionality for the test is to 
demonstrate we can skip labeled container, So I think it does not mater.

bq. And testSkipLabeledContainer is fully covered by 
testIdealAllocationForLabels. Since we have already checked #container 
preempted in each application in testIdealAllocationForLabels, which implies 
labeled containers are ignored.
Agreed 

bq. A minor suggest is rename setLabelContainer to setLabeledContainer
Agreed

bq. An application's(if not specified any labels during submission time) 
containers, may fall in to nodes where it can be labelled or not labelled. Am I 
correct?

No , As of now containers with no labels can not go to labeled nodes.

Thanks,
Mayank


 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-08 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-6.patch

Attaching patch

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state

2015-01-08 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2421:
--
Target Version/s: 2.7.0  (was: 2.6.0)

 CapacityScheduler still allocates containers to an app in the FINISHING state
 -

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Thomas Graves
Assignee: chang li
 Attachments: yarn2421.patch, yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-01-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: (was: YARN-2800-20141118-1.patch)

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-01-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20150108-1-full.patch

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
 YARN-2786-20150108-1-full.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-01-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2800-20141118-1.patch

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-01-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20150108-1-without-yarn.cmd.patch

Updated patch addressed comments from [~aw], and fixed findbugs warning. 

Please kindly review, thanks.

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
 YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3016) (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager

2015-01-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269824#comment-14269824
 ] 

Wangda Tan commented on YARN-3016:
--

I meant CommonNodeLabelsManager, methods start with internal, we should have 
a way to make them easier.

 (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in 
 CommonNodeLabelsManager
 -

 Key: YARN-3016
 URL: https://issues.apache.org/jira/browse/YARN-3016
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan

 Now we have separated but similar implementations for add/remove/replace 
 labels on node in CommonNodeLabelsManager, we should merge it to a single one 
 for easier modify them and better readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269359#comment-14269359
 ] 

Hudson commented on YARN-2230:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1999 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1999/])
YARN-2230. Fixed few configs description in yarn-default.xml. Contributed by 
Vijay Bhat (jianhe: rev fe8d2bd74175e7ad521bc310c41a367c0946d6ec)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269353#comment-14269353
 ] 

Hudson commented on YARN-3010:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1999 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1999/])
YARN-3010. Fixed findbugs warning in AbstractYarnScheduler. Contributed by Yi 
Liu (jianhe: rev e13a484a2be64fb781c5eca5ae7056cbe194ac5e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269356#comment-14269356
 ] 

Hudson commented on YARN-2936:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1999 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1999/])
YARN-2936. Changed YARNDelegationTokenIdentifier to set proto fields on 
getProto method. Contributed by Varun Saxena (jianhe: rev 
2638f4d0f0da375b0dd08f3188057637ed0f32d5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java


 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, 
 YARN-2936.006.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269357#comment-14269357
 ] 

Hudson commented on YARN-2880:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1999 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1999/])
Moved YARN-2880 to improvement section in CHANGES.txt (jianhe: rev 
ef237bd52fc570292a7e608b373b51dd6d1590b8)
* hadoop-yarn-project/CHANGES.txt


 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2015-01-08 Thread Andrew Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269373#comment-14269373
 ] 

Andrew Johnson commented on YARN-2893:
--

Yeah, that definitely seems like its worth a look.  Is there anything specific 
I should look out for?

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269380#comment-14269380
 ] 

Hudson commented on YARN-2880:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #64 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/64/])
Moved YARN-2880 to improvement section in CHANGES.txt (jianhe: rev 
ef237bd52fc570292a7e608b373b51dd6d1590b8)
* hadoop-yarn-project/CHANGES.txt


 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269376#comment-14269376
 ] 

Hudson commented on YARN-3010:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #64 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/64/])
YARN-3010. Fixed findbugs warning in AbstractYarnScheduler. Contributed by Yi 
Liu (jianhe: rev e13a484a2be64fb781c5eca5ae7056cbe194ac5e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269382#comment-14269382
 ] 

Hudson commented on YARN-2230:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #64 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/64/])
YARN-2230. Fixed few configs description in yarn-default.xml. Contributed by 
Vijay Bhat (jianhe: rev fe8d2bd74175e7ad521bc310c41a367c0946d6ec)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269411#comment-14269411
 ] 

Hudson commented on YARN-3010:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #68 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/68/])
YARN-3010. Fixed findbugs warning in AbstractYarnScheduler. Contributed by Yi 
Liu (jianhe: rev e13a484a2be64fb781c5eca5ae7056cbe194ac5e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269415#comment-14269415
 ] 

Hudson commented on YARN-2880:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #68 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/68/])
Moved YARN-2880 to improvement section in CHANGES.txt (jianhe: rev 
ef237bd52fc570292a7e608b373b51dd6d1590b8)
* hadoop-yarn-project/CHANGES.txt


 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269417#comment-14269417
 ] 

Hudson commented on YARN-2230:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #68 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/68/])
YARN-2230. Fixed few configs description in yarn-default.xml. Contributed by 
Vijay Bhat (jianhe: rev fe8d2bd74175e7ad521bc310c41a367c0946d6ec)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269432#comment-14269432
 ] 

Hudson commented on YARN-2230:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2018 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2018/])
YARN-2230. Fixed few configs description in yarn-default.xml. Contributed by 
Vijay Bhat (jianhe: rev fe8d2bd74175e7ad521bc310c41a367c0946d6ec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269426#comment-14269426
 ] 

Hudson commented on YARN-3010:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2018 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2018/])
YARN-3010. Fixed findbugs warning in AbstractYarnScheduler. Contributed by Yi 
Liu (jianhe: rev e13a484a2be64fb781c5eca5ae7056cbe194ac5e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269379#comment-14269379
 ] 

Hudson commented on YARN-2936:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #64 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/64/])
YARN-2936. Changed YARNDelegationTokenIdentifier to set proto fields on 
getProto method. Contributed by Varun Saxena (jianhe: rev 
2638f4d0f0da375b0dd08f3188057637ed0f32d5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java


 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, 
 YARN-2936.006.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2571) RM to support YARN registry

2015-01-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269548#comment-14269548
 ] 

Steve Loughran commented on YARN-2571:
--

Vinod, 

having the RM create the user paths allows the registry to be set up with the 
correct permissions as YARN jobs are created. Without that, if there is no path 
for that user set up, the application is likely to fail post-launch with some 
error. 

For cleanup, automatic purging of records keeps the registry data somewhat 
under control, without applications having to go to the effort of writing these 
not-yet-implemented cleanup containers. It's not a particularly complex piece 
of code; there's tests for the distributed shell that verify it works in 
YARN-2646

 RM to support YARN registry 
 

 Key: YARN-2571
 URL: https://issues.apache.org/jira/browse/YARN-2571
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
 YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
 YARN-2571-008.patch, YARN-2571-009.patch


 The RM needs to (optionally) integrate with the YARN registry:
 # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
 principals)
 # app-launch: create the user directory /users/$username with the relevant 
 permissions (CRD) for them to create subnodes.
 # attempt, container, app completion: remove service records with the 
 matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269414#comment-14269414
 ] 

Hudson commented on YARN-2936:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #68 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/68/])
YARN-2936. Changed YARNDelegationTokenIdentifier to set proto fields on 
getProto method. Contributed by Varun Saxena (jianhe: rev 
2638f4d0f0da375b0dd08f3188057637ed0f32d5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java


 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, 
 YARN-2936.006.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269429#comment-14269429
 ] 

Hudson commented on YARN-2936:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2018 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2018/])
YARN-2936. Changed YARNDelegationTokenIdentifier to set proto fields on 
getProto method. Contributed by Varun Saxena (jianhe: rev 
2638f4d0f0da375b0dd08f3188057637ed0f32d5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java


 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, 
 YARN-2936.006.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269430#comment-14269430
 ] 

Hudson commented on YARN-2880:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2018 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2018/])
Moved YARN-2880 to improvement section in CHANGES.txt (jianhe: rev 
ef237bd52fc570292a7e608b373b51dd6d1590b8)
* hadoop-yarn-project/CHANGES.txt


 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2571) RM to support YARN registry

2015-01-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269556#comment-14269556
 ] 

Steve Loughran commented on YARN-2571:
--

Xuan

1. If you can show me an example of an active service to start with, I'll 
gladly make it active only

2. We're relying on operations to be idempotent: whoever creates last wins, 
whoever deletes last wins. There's some race conditions on cleanup if there's a 
change between a read and a delete, but that's what you get in a world without 
transactions.

 RM to support YARN registry 
 

 Key: YARN-2571
 URL: https://issues.apache.org/jira/browse/YARN-2571
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
 YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
 YARN-2571-008.patch, YARN-2571-009.patch


 The RM needs to (optionally) integrate with the YARN registry:
 # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
 principals)
 # app-launch: create the user directory /users/$username with the relevant 
 permissions (CRD) for them to create subnodes.
 # attempt, container, app completion: remove service records with the 
 matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269836#comment-14269836
 ] 

Wangda Tan commented on YARN-2933:
--

[~mayank_bansal],
Thanks for updating the patch,

bq. I don't think thats needed as the basic functionality for the test is to 
demonstrate we can skip labeled container, So I think it does not mater.
I think it matters, and this test is not only verify we can skip labeled 
containers, but also demonstrate we can make ideal_allocation correct. I still 
suggest to add the one-line change as I mentioned in 
https://issues.apache.org/jira/browse/YARN-2933?focusedCommentId=14268631page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14268631
  to verify ideal_allocation is correctly computed.

And for [~sunilg]'s comment, I think Mayank has already answered you.

Thanks,
Wangda



 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269858#comment-14269858
 ] 

Jian He commented on YARN-2933:
---

one other comment: getNodeLabels is not used anywhere, we can remove.

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269868#comment-14269868
 ] 

Sunil G commented on YARN-2933:
---

Hi [~mayank_bansal]
Thank you for the clarification.

I have one more small nit in a test case

{code}
   if(setAMContainer  i == 0){
 cLive.add(mockContainer(appAttId, cAlloc, unit, 0));
   }else if(setLabeledContainer  i ==1){
 cLive.add(mockContainer(appAttId, cAlloc, unit, 2)); 
   }
   else{
 cLive.add(mockContainer(appAttId, cAlloc, unit, 1));
   }
{code}


For *mockContainer*, last parameter is integer. And it represents 0 for AM 
container, 1 for normal container, and 2 for labelled container.
Could we make it with a macro and more generic. So in future, it will be easy 
to add a type of container and for readability. 
We can have an array of different container types, and it can added as needed 
later.
This array index can be used with Enum to create a  mock container. 

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269875#comment-14269875
 ] 

Hadoop QA commented on YARN-2933:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690901/YARN-2933-6.patch
  against trunk revision 20625c8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 29 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-gridmix.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6279//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6279//artifact/patchprocess/newPatchFindbugsWarningshadoop-gridmix.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6279//console

This message is automatically generated.

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269878#comment-14269878
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12690906/YARN-2786-20150108-1-without-yarn.cmd.patch
  against trunk revision 20625c8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6281//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6281//console

This message is automatically generated.

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
 YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending already-sent completed containers to RM until containers are acked

2015-01-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2997:
--
Summary: NM keeps sending already-sent completed containers to RM until 
containers are acked  (was: NM keeps sending finished containers to RM until 
app is finished)

 NM keeps sending already-sent completed containers to RM until containers are 
 acked
 ---

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.5.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending already-sent completed containers to RM until containers are removed from context

2015-01-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2997:
--
Summary: NM keeps sending already-sent completed containers to RM until 
containers are removed from context  (was: NM keeps sending already-sent 
completed containers to RM until containers are acked)

 NM keeps sending already-sent completed containers to RM until containers are 
 removed from context
 --

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.5.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2997) NM keeps sending already-sent completed containers to RM until containers are removed from context

2015-01-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269885#comment-14269885
 ] 

Jian He commented on YARN-2997:
---

patch looks good to me.

 NM keeps sending already-sent completed containers to RM until containers are 
 removed from context
 --

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.5.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2997) NM keeps sending already-sent completed containers to RM until containers are removed from context

2015-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269918#comment-14269918
 ] 

Hudson commented on YARN-2997:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6833 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6833/])
YARN-2997. Fixed NodeStatusUpdater to not send alreay-sent completed container 
statuses on heartbeat. Contributed by Chengbing Liu (jianhe: rev 
cc2a745f7e82c9fa6de03242952347c54c52dccc)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


 NM keeps sending already-sent completed containers to RM until containers are 
 removed from context
 --

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Fix For: 2.7.0

 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.5.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3014) Replaces labels on a host should update all NM's labels on that host

2015-01-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3014:
-
Summary: Replaces labels on a host should update all NM's labels on that 
host  (was: Changing labels on a host should update all NM's labels on that 
host)

 Replaces labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3014-1.patch


 Admin can either specify labels on a host (by running {{yarn rmadmin 
 -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
 rmadmin -replaceLabelsOnNode host1:port,label1}}).
 If user has specified label=x on a NM (instead of host), and later set the 
 label=y on host of the NM. NM's label should update to y as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3014) Changing labels on a host should update all NM's labels on that host

2015-01-08 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3014:
-
Attachment: YARN-3014-1.patch

Attached patch, and kick jenkins.
This patch contains refactoring described in YARN-3016, which merged 
internalAdd/Remove/ReplaceLabels to one method.

 Changing labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3014-1.patch


 Admin can either specify labels on a host (by running {{yarn rmadmin 
 -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
 rmadmin -replaceLabelsOnNode host1:port,label1}}).
 If user has specified label=x on a NM (instead of host), and later set the 
 label=y on host of the NM. NM's label should update to y as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269973#comment-14269973
 ] 

Hadoop QA commented on YARN-3015:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690927/YARN-3015.001.patch
  against trunk revision cc2a745.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6285//console

This message is automatically generated.

 yarn classpath command should support same options as hadoop classpath.
 ---

 Key: YARN-3015
 URL: https://issues.apache.org/jira/browse/YARN-3015
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Chris Nauroth
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3015.001.patch


 HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional 
 expansion of the wildcards and bundling the classpath into a jar file 
 containing a manifest with the Class-Path attribute. The other classpath 
 commands should do the same for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-712) RMDelegationTokenSecretManager shouldn't start in non-secure mode

2015-01-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-712.
--
Resolution: Won't Fix

 RMDelegationTokenSecretManager shouldn't start in non-secure mode 
 --

 Key: YARN-712
 URL: https://issues.apache.org/jira/browse/YARN-712
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He

 RM will just be doing useless work as no tokens are issued.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3019) Enable work-preserving restart by default

2015-01-08 Thread Jian He (JIRA)
Jian He created YARN-3019:
-

 Summary: Enable work-preserving restart by default 
 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


The proposal is to set yarn.resourcemanager.work-preserving-recovery.enabled 
to true by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3019) Enable RM work-preserving restart by default

2015-01-08 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3019:
--
Summary: Enable RM work-preserving restart by default   (was: Enable 
work-preserving restart by default )

 Enable RM work-preserving restart by default 
 -

 Key: YARN-3019
 URL: https://issues.apache.org/jira/browse/YARN-3019
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 The proposal is to set 
 yarn.resourcemanager.work-preserving-recovery.enabled to true by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers

2015-01-08 Thread Peter D Kirchner (JIRA)
Peter D Kirchner created YARN-3020:
--

 Summary: n similar addContainerRequest()s produce n*(n+1)/2 
containers
 Key: YARN-3020
 URL: https://issues.apache.org/jira/browse/YARN-3020
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.5.2, 2.5.1, 2.6.0, 2.5.0
Reporter: Peter D Kirchner


BUG: If the application master calls addContainerRequest() n times, but with 
the same priority, I get 1+2+3+...+n containers = n*(n+1)/2 .

If the application master calls addContainerRequest() n times, but with a 
unique priority each time, I get n containers (as I intended).

Analysis:
There is a logic problem in AMRMClientImpl.java.
Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent 
calls to addContainerRequest(), addResourceRequest() finds the previous 
matching remoteRequest and increments the container count rather than starting 
anew, and does an addResourceRequestToAsk() which defeats the ask.clear().

From documentation and code comments, it was hard for me to discern the 
intended behavior of the API, but the inconsistency reported in this issue 
suggests one case or the other is implemented incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3014) Replaces labels on a host should update all NM's labels on that host

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270069#comment-14270069
 ] 

Hadoop QA commented on YARN-3014:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690922/YARN-3014-1.patch
  against trunk revision cc2a745.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6284//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6284//console

This message is automatically generated.

 Replaces labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3014-1.patch


 Admin can either specify labels on a host (by running {{yarn rmadmin 
 -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
 rmadmin -replaceLabelsOnNode host1:port,label1}}).
 If user has specified label=x on a NM (instead of host), and later set the 
 label=y on host of the NM. NM's label should update to y as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2141) [Umbrella] Capture container and node resource consumption

2015-01-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269919#comment-14269919
 ] 

Vinod Kumar Vavilapalli commented on YARN-2141:
---

Related to and very likely a dup of YARN-1012 which is part of a larger effort.

 [Umbrella] Capture container and node resource consumption
 --

 Key: YARN-2141
 URL: https://issues.apache.org/jira/browse/YARN-2141
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Carlo Curino
Priority: Minor

 Collecting per-container and per-node resource consumption statistics in a 
 fairly granular manner, and making them available to both infrastructure code 
 (e.g., schedulers) and users (e.g., AMs or directly users via webapps), can 
 facilitate several performance work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2984) Metrics for container's actual memory usage

2015-01-08 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269944#comment-14269944
 ] 

Anubhav Dhoot commented on YARN-2984:
-

Seems good. I tested it and shows up on the jmx page with its information
I would rename the ContainerUsageMetrics to just ContainerMetrics as I am 
thinking its a good place for all container related metrics instead of having 
multiple ContainerMetrics.


 Metrics for container's actual memory usage
 ---

 Key: YARN-2984
 URL: https://issues.apache.org/jira/browse/YARN-2984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2984-prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track memory usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270004#comment-14270004
 ] 

Hadoop QA commented on YARN-1798:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12640845/YARN-1798.1.patch
  against trunk revision cc2a745.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6283//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6283//console

This message is automatically generated.

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt, 
 YARN-1798.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269905#comment-14269905
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12690906/YARN-2786-20150108-1-without-yarn.cmd.patch
  against trunk revision 708b1aa.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6282//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6282//console

This message is automatically generated.

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch, 
 YARN-2786-20150108-1-full.patch, YARN-2786-20150108-1-without-yarn.cmd.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2421) CapacityScheduler still allocates containers to an app in the FINISHING state

2015-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269938#comment-14269938
 ] 

Hadoop QA commented on YARN-2421:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663584/yarn2421.patch
  against trunk revision 20625c8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1215 javac 
compiler warnings (more than the trunk's current 1214 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6280//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6280//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6280//console

This message is automatically generated.

 CapacityScheduler still allocates containers to an app in the FINISHING state
 -

 Key: YARN-2421
 URL: https://issues.apache.org/jira/browse/YARN-2421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.1
Reporter: Thomas Graves
Assignee: chang li
 Attachments: yarn2421.patch, yarn2421.patch, yarn2421.patch


 I saw an instance of a bad application master where it unregistered with the 
 RM but then continued to call into allocate.  The RMAppAttempt went to the 
 FINISHING state, but the capacity scheduler kept allocating it containers.   
 We should probably have the capacity scheduler check that the application 
 isn't in one of the terminal states before giving it containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux

2015-01-08 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269940#comment-14269940
 ] 

Jian He commented on YARN-1798:
---

I think the issue is outdated now, may be close?

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt, 
 YARN-1798.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3015) yarn classpath command should support same options as hadoop classpath.

2015-01-08 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3015:
---
Attachment: YARN-3015.001.patch

 yarn classpath command should support same options as hadoop classpath.
 ---

 Key: YARN-3015
 URL: https://issues.apache.org/jira/browse/YARN-3015
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Chris Nauroth
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3015.001.patch


 HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional 
 expansion of the wildcards and bundling the classpath into a jar file 
 containing a manifest with the Class-Path attribute. The other classpath 
 commands should do the same for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2965) Enhance Node Managers to monitor and report the resource usage on machines

2015-01-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269921#comment-14269921
 ] 

Vinod Kumar Vavilapalli commented on YARN-2965:
---

Linking all related efforts.

Related to and very likely a dup of YARN-1012 which is part of a larger effort.

 Enhance Node Managers to monitor and report the resource usage on machines
 --

 Key: YARN-2965
 URL: https://issues.apache.org/jira/browse/YARN-2965
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: ddoc_RT.docx


 This JIRA is about augmenting Node Managers to monitor the resource usage on 
 the machine, aggregates these reports and exposes them to the RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2015-01-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269926#comment-14269926
 ] 

Vinod Kumar Vavilapalli commented on YARN-2745:
---

Haven't read the design doc yet. Linking all related efforts so there are no 
duplicates.

Related to and very likely a dup of YARN-1011.


 Extend YARN to support multi-resource packing of tasks
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, scheduler
Reporter: Robert Grandl
Assignee: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
 tetris_paper.pdf


 In this umbrella JIRA we propose an extension to existing scheduling 
 techniques, which accounts for all resources used by a task (CPU, memory, 
 disk, network) and it is able to achieve three competing objectives: 
 fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)