date:20130402


[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619551#comment-13619551
 ] 

Hadoop QA commented on YARN-447:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576484/YARN-447-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/644//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/644//console

This message is automatically generated.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Assignee: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-447) applicationComparator improvement for CS


[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619573#comment-13619573
 ] 

Vinod Kumar Vavilapalli commented on YARN-447:
--

Latest patch looks good. +1. Checking this in.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Assignee: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

[
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619578#comment-13619578
]

Bikas Saha commented on YARN-392:
-

Sorry, I did not see that patch carefully and assumed that it does what is
suggested in
https://issues.apache.org/jira/browse/YARN-392?focusedCommentId=13583713page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13583713
whereas it actually implements the proposal in YARN-398.
The typical use case for blacklisting is to disable a set of nodes globally.
e.g. never gives me nodes A and B even when I ask for resources at *. Having to
implement blacklisting by doing it on a per-priority will make the common case
painful to work with. So I am not in favor of such a proposal unless there is a
strong use case for blacklisting on specific priorities. Arun, Vinod and I had
an offline discussion where we agreed that we are better off creating an API
for blacklisting a set of nodes.

Make it possible to schedule to specific nodes without dropping locality

Key: YARN-392
URL: https://issues.apache.org/jira/browse/YARN-392
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
Attachments: YARN-392-1.patch, YARN-392.patch

Currently its not possible to specify scheduling requests for specific nodes
and nowhere else. The RM automatically relaxes locality to rack and * and
assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-04-02 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619580#comment-13619580
 ] 

Hitesh Shah commented on YARN-193:
--

{code}
+and will get capped to this value. When it is set to -1, checking against 
the
+maximum allocation should be disable./description
{code}

I am not sure if we should allow disabling of the max memory and max vcores 
setting. Was it supported earlier or does the patch introduce this support?

Spelling mistake: alloacated

{code}
+LOG.info(Resource request was not able to be alloacated for +
+ application attempt  + appAttemptId +  because it +
+ failed to pass the validation.  + e.getMessage());
{code}

The above could be made more simple and brief. For example, LOG.warn(Invalid 
resource ask by application  + appAttemptId, e); . Also, please use 
LOG.level(message, throwable) when trying to log an exception. 

{code}
+RPCUtil.getRemoteException(e);
{code}

Above is missing a throw. 

Likewise, in handling of submitApplication, please change log level to warn and 
also use the correct log function instead of using e.getMessage(). 

{code}
 if (globalMaxAppAttempts = 0) {
   throw new YarnException(
   The global max attempts should be a positive integer.);
 }
{code}

Unrelated to this patch but when throwing/logging errors related to configs, we 
should always point to the configuration property to let the user know which 
property needs to be changed. Please file a separate jira for the above. With 
respect to this, it may be useful to point to the property when throwing 
exceptions for invalid min/max memory/vcores.

Unnecessary import in RMAppAttemptImpl:

{code}
 +import org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils;
{code}

For InvalidResourceRequestException, missing javadocs for class description. 

Question - should normalization of resource requests be done inside the 
scheduler or in the ApplicationMasterService itself which handles the allocate 
call?

If maxMemory or maxVcores is set to -1, what will happen when normalize() is 
called? I think there are missing tests related to use of 
DISABLE_RESOURCELIMIT_CHECK in both validate and normalize functions that 
should have caught this error. In any case, the main question is whether 
DISABLE_RESOURCELIMIT_CHECK should actually be allowed.













 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, 
 YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, 
 YARN-193.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-447) applicationComparator improvement for CS


[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619581#comment-13619581
 ] 

Hudson commented on YARN-447:
-

Integrated in Hadoop-trunk-Commit #3547 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3547/])
YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator 
in ApplicationId. Contributed by Nemon Lou. (Revision 1463405)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463405
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java


 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Assignee: nemon lou
Priority: Minor
 Fix For: 2.0.5-beta

 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-444) Move special container exit codes from YarnConfiguration to API

[
https://issues.apache.org/jira/browse/YARN-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619584#comment-13619584
]

Bikas Saha commented on YARN-444:
-

IMO when the container exits because YARN took some specific action on it eg.
killed due to preemption or killed due to memory then YARN should assign action
specific exit status using new values defined inside ContainerExitStatus.
Currently, NM kills the container and assigns its real exit code to exit
status. So at the AM its hard to tell why the container exited. Of course, not
as part of this jira.

Move special container exit codes from YarnConfiguration to API
---

Key: YARN-444
URL: https://issues.apache.org/jira/browse/YARN-444
Project: Hadoop YARN
Issue Type: Sub-task
Components: api, applications/distributed-shell
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Attachments: YARN-444-1.patch, YARN-444.patch

YarnConfiguration currently contains the special container exit codes
INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100,
and DISKS_FAILED = -101.
These are not really not really related to configuration, and
YarnConfiguration should not become a place to put miscellaneous constants.
Per discussion on YARN-417, appmaster writers need to be able to provide
special handling for them, so it might make sense to move these to their own
user-facing class.

[jira] [Created] (YARN-527) Local filecache mkdir fails

2013-04-02 Thread Knut O. Hellan (JIRA)

Knut O. Hellan created YARN-527:
---

 Summary: Local filecache mkdir fails
 Key: YARN-527
 URL: https://issues.apache.org/jira/browse/YARN-527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
 Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes and 
six worker nodes.
Reporter: Knut O. Hellan
Priority: Minor


Jobs failed with no other explanation than this stack trace:

2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
nostics report from attempt_1364591875320_0017_m_00_0: java.io.IOException: 
mkdir of /disk3/yarn/local/filecache/-42307893
55400878397 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
at 
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Manually creating the directory worked. This behavior was common to at least 
several nodes in the cluster.

The situation was resolved by removing and recreating all 
/disk?/yarn/local/filecache directories on all nodes.

It is unclear whether Yarn struggled with the number of files or if there were 
corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-527) Local filecache mkdir fails

2013-04-02 Thread Knut O. Hellan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Knut O. Hellan updated YARN-527:


Attachment: yarn-site.xml

 Local filecache mkdir fails
 ---

 Key: YARN-527
 URL: https://issues.apache.org/jira/browse/YARN-527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
 Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes 
 and six worker nodes.
Reporter: Knut O. Hellan
Priority: Minor
 Attachments: yarn-site.xml


 Jobs failed with no other explanation than this stack trace:
 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
 nostics report from attempt_1364591875320_0017_m_00_0: 
 java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
 55400878397 failed
 at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
 at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
 at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
 at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
 at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Manually creating the directory worked. This behavior was common to at least 
 several nodes in the cluster.
 The situation was resolved by removing and recreating all 
 /disk?/yarn/local/filecache directories on all nodes.
 It is unclear whether Yarn struggled with the number of files or if there 
 were corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM


[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619690#comment-13619690
 ] 

Hudson commented on YARN-309:
-

Integrated in Hadoop-Yarn-trunk #173 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/173/])
YARN-309. Changed NodeManager to obtain heart-beat interval from the 
ResourceManager. Contributed by Xuan Gong. (Revision 1463346)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463346
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Make RM provide heartbeat interval to NM
 

 Key: YARN-309
 URL: https://issues.apache.org/jira/browse/YARN-309
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.0.5-beta

 Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, 
 YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, 
 YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing


[ 
https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619694#comment-13619694
 ] 

Hudson commented on YARN-516:
-

Integrated in Hadoop-Yarn-trunk #173 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/173/])
YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. 
Contributed by Andrew Wang. (Revision 1463362)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463362
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


 TestContainerLocalizer.testContainerLocalizerMain is failing
 

 Key: YARN-516
 URL: https://issues.apache.org/jira/browse/YARN-516
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Andrew Wang
 Fix For: 2.0.5-beta

 Attachments: YARN-516.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL


[ 
https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619695#comment-13619695
 ] 

Hudson commented on YARN-524:
-

Integrated in Hadoop-Yarn-trunk #173 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/173/])
YARN-524 TestYarnVersionInfo failing if generated properties doesn't 
include an SVN URL (Revision 1463300)

 Result = SUCCESS
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463300
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java


 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
 --

 Key: YARN-524
 URL: https://issues.apache.org/jira/browse/YARN-524
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
 Environment: OS/X with branch off github
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 3.0.0

 Attachments: YARN-524.patch


 {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call 
 returns {{Unknown}} when that is the value inserted into the property file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-525) make CS node-locality-delay refreshable

2013-04-02 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-525:
---

Issue Type: Improvement  (was: Bug)
   Summary: make CS node-locality-delay refreshable  (was: 
yarn.scheduler.capacity.node-locality-delay doesn't change with rmadmin 
-refreshQueues)

 make CS node-locality-delay refreshable
 ---

 Key: YARN-525
 URL: https://issues.apache.org/jira/browse/YARN-525
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.0.3-alpha, 0.23.7
Reporter: Thomas Graves

 the config yarn.scheduler.capacity.node-locality-delay doesn't change when 
 you change the value in capacity_scheduler.xml and then run yarn rmadmin 
 -refreshQueues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-528) Make IDs read only

Robert Joseph Evans created YARN-528:


 Summary: Make IDs read only
 Key: YARN-528
 URL: https://issues.apache.org/jira/browse/YARN-528
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Robert Joseph Evans


I really would like to rip out most if not all of the abstraction layer that 
sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
no plans to support any other serialization type, and the abstraction layer 
just, makes it more difficult to change protocols, makes changing them more 
error prone, and slows down the objects themselves.  

Completely doing that is a lot of work.  This JIRA is a first step towards 
that.  It makes the various ID objects immutable.  If this patch is wel 
received I will try to go through other objects/classes of objects and update 
them in a similar way.

This is probably the last time we will be able to make a change like this 
before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Joseph Evans updated YARN-528:
-

Attachment: YARN-528.txt

This patch contains changes to both Map/Reduce IDs as well as YARN APIs. I
don't really want to split them up right now, but I am happy to file a separate
JIRA for tracking purposes if the community decides this is a direction we want
to go in.

Make IDs read only
--

Key: YARN-528
URL: https://issues.apache.org/jira/browse/YARN-528
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Robert Joseph Evans
Attachments: YARN-528.txt

I really would like to rip out most if not all of the abstraction layer that
sits in-between Protocol Buffers, the RPC, and the actual user code. We have
no plans to support any other serialization type, and the abstraction layer
just, makes it more difficult to change protocols, makes changing them more
error prone, and slows down the objects themselves.
Completely doing that is a lot of work. This JIRA is a first step towards
that. It makes the various ID objects immutable. If this patch is wel
received I will try to go through other objects/classes of objects and update
them in a similar way.
This is probably the last time we will be able to make a change like this
before 2.0 stabilizes and YARN APIs will not be able to be changed.

[jira] [Assigned] (YARN-528) Make IDs read only


 [ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reassigned YARN-528:


Assignee: Robert Joseph Evans

 Make IDs read only
 --

 Key: YARN-528
 URL: https://issues.apache.org/jira/browse/YARN-528
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: YARN-528.txt


 I really would like to rip out most if not all of the abstraction layer that 
 sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
 no plans to support any other serialization type, and the abstraction layer 
 just, makes it more difficult to change protocols, makes changing them more 
 error prone, and slows down the objects themselves.  
 Completely doing that is a lot of work.  This JIRA is a first step towards 
 that.  It makes the various ID objects immutable.  If this patch is wel 
 received I will try to go through other objects/classes of objects and update 
 them in a similar way.
 This is probably the last time we will be able to make a change like this 
 before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619750#comment-13619750
]

Hadoop QA commented on YARN-528:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12576553/YARN-528.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 49 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/645//console

This message is automatically generated.

Make IDs read only
--

Key: YARN-528
URL: https://issues.apache.org/jira/browse/YARN-528
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Attachments: YARN-528.txt

[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM


[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619761#comment-13619761
 ] 

Hudson commented on YARN-309:
-

Integrated in Hadoop-Hdfs-trunk #1362 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/])
YARN-309. Changed NodeManager to obtain heart-beat interval from the 
ResourceManager. Contributed by Xuan Gong. (Revision 1463346)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463346
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Make RM provide heartbeat interval to NM
 

 Key: YARN-309
 URL: https://issues.apache.org/jira/browse/YARN-309
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.0.5-beta

 Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, 
 YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, 
 YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing


[ 
https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619765#comment-13619765
 ] 

Hudson commented on YARN-516:
-

Integrated in Hadoop-Hdfs-trunk #1362 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/])
YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. 
Contributed by Andrew Wang. (Revision 1463362)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463362
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


 TestContainerLocalizer.testContainerLocalizerMain is failing
 

 Key: YARN-516
 URL: https://issues.apache.org/jira/browse/YARN-516
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Andrew Wang
 Fix For: 2.0.5-beta

 Attachments: YARN-516.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL


[ 
https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619766#comment-13619766
 ] 

Hudson commented on YARN-524:
-

Integrated in Hadoop-Hdfs-trunk #1362 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/])
YARN-524 TestYarnVersionInfo failing if generated properties doesn't 
include an SVN URL (Revision 1463300)

 Result = FAILURE
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463300
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java


 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
 --

 Key: YARN-524
 URL: https://issues.apache.org/jira/browse/YARN-524
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
 Environment: OS/X with branch off github
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 3.0.0

 Attachments: YARN-524.patch


 {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call 
 returns {{Unknown}} when that is the value inserted into the property file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-447) applicationComparator improvement for CS


[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619768#comment-13619768
 ] 

Hudson commented on YARN-447:
-

Integrated in Hadoop-Hdfs-trunk #1362 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1362/])
YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator 
in ApplicationId. Contributed by Nemon Lou. (Revision 1463405)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463405
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java


 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Assignee: nemon lou
Priority: Minor
 Fix For: 2.0.5-beta

 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-04-02 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619778#comment-13619778
 ] 

Thomas Graves commented on YARN-392:


Bikas when you say creating an API for blacklisting a set of nodes are you 
basically referring to YARN-398 or something else?

 Make it possible to schedule to specific nodes without dropping locality
 

 Key: YARN-392
 URL: https://issues.apache.org/jira/browse/YARN-392
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
 Attachments: YARN-392-1.patch, YARN-392.patch


 Currently its not possible to specify scheduling requests for specific nodes 
 and nowhere else. The RM automatically relaxes locality to rack and * and 
 assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-527) Local filecache mkdir fails

2013-04-02 Thread Knut O. Hellan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619797#comment-13619797
 ] 

Knut O. Hellan commented on YARN-527:
-

Digging through the code, it looks to me like the native Java File.mkdirs is 
used to actually create the directory and it will not give information about 
why it failed. If that is the case then I guess this issue is actually a 
feature request that yarn should be better at cleaning up old file caches so 
that this situation will not happen.

 Local filecache mkdir fails
 ---

 Key: YARN-527
 URL: https://issues.apache.org/jira/browse/YARN-527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
 Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes 
 and six worker nodes.
Reporter: Knut O. Hellan
Priority: Minor
 Attachments: yarn-site.xml


 Jobs failed with no other explanation than this stack trace:
 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
 nostics report from attempt_1364591875320_0017_m_00_0: 
 java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
 55400878397 failed
 at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
 at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
 at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
 at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
 at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Manually creating the directory worked. This behavior was common to at least 
 several nodes in the cluster.
 The situation was resolved by removing and recreating all 
 /disk?/yarn/local/filecache directories on all nodes.
 It is unclear whether Yarn struggled with the number of files or if there 
 were corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM


[ 
https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619817#comment-13619817
 ] 

Hudson commented on YARN-309:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-309. Changed NodeManager to obtain heart-beat interval from the 
ResourceManager. Contributed by Xuan Gong. (Revision 1463346)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463346
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/YarnServerBuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Make RM provide heartbeat interval to NM
 

 Key: YARN-309
 URL: https://issues.apache.org/jira/browse/YARN-309
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.0.5-beta

 Attachments: YARN-309.10.patch, YARN-309.11.patch, YARN-309.1.patch, 
 YARN-309-20130331.txt, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, 
 YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch, YARN-309.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing


[ 
https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619821#comment-13619821
 ] 

Hudson commented on YARN-516:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-516. Fix failure in TestContainerLocalizer caused by HADOOP-9357. 
Contributed by Andrew Wang. (Revision 1463362)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463362
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestContainerLocalizer.java


 TestContainerLocalizer.testContainerLocalizerMain is failing
 

 Key: YARN-516
 URL: https://issues.apache.org/jira/browse/YARN-516
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Andrew Wang
 Fix For: 2.0.5-beta

 Attachments: YARN-516.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-524) TestYarnVersionInfo failing if generated properties doesn't include an SVN URL


[ 
https://issues.apache.org/jira/browse/YARN-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619822#comment-13619822
 ] 

Hudson commented on YARN-524:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-524 TestYarnVersionInfo failing if generated properties doesn't 
include an SVN URL (Revision 1463300)

 Result = SUCCESS
stevel : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463300
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java


 TestYarnVersionInfo failing if generated properties doesn't include an SVN URL
 --

 Key: YARN-524
 URL: https://issues.apache.org/jira/browse/YARN-524
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
 Environment: OS/X with branch off github
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 3.0.0

 Attachments: YARN-524.patch


 {{TestYarnVersionInfo}} fails in the {{YarnVersionInfo.getUrl()}} call 
 returns {{Unknown}} when that is the value inserted into the property file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment


[ 
https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619825#comment-13619825
 ] 

Hudson commented on YARN-475:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-475. Remove a unused constant in the public API - 
ApplicationConstants.AM_APP_ATTEMPT_ID_ENV. Contributed by Hitesh Shah. 
(Revision 1463033)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463033
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java


 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in 
 an AM's environment
 ---

 Key: YARN-475
 URL: https://issues.apache.org/jira/browse/YARN-475
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Fix For: 2.0.5-beta

 Attachments: YARN-475.1.patch


 AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive 
 the application attempt id from the container id. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-447) applicationComparator improvement for CS


[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619824#comment-13619824
 ] 

Hudson commented on YARN-447:
-

Integrated in Hadoop-Mapreduce-trunk #1389 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1389/])
YARN-447. Move ApplicationComparator in CapacityScheduler to use comparator 
in ApplicationId. Contributed by Nemon Lou. (Revision 1463405)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463405
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java


 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Assignee: nemon lou
Priority: Minor
 Fix For: 2.0.5-beta

 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch, YARN-447-trunk.patch, YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-528) Make IDs read only


[ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619911#comment-13619911
 ] 

Robert Joseph Evans commented on YARN-528:
--

The build failed, because it needs to be upmerged, again :(

 Make IDs read only
 --

 Key: YARN-528
 URL: https://issues.apache.org/jira/browse/YARN-528
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: YARN-528.txt


 I really would like to rip out most if not all of the abstraction layer that 
 sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
 no plans to support any other serialization type, and the abstraction layer 
 just, makes it more difficult to change protocols, makes changing them more 
 error prone, and slows down the objects themselves.  
 Completely doing that is a lot of work.  This JIRA is a first step towards 
 that.  It makes the various ID objects immutable.  If this patch is wel 
 received I will try to go through other objects/classes of objects and update 
 them in a similar way.
 This is probably the last time we will be able to make a change like this 
 before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-528) Make IDs read only


 [ 
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated YARN-528:
-

Attachment: YARN-528.txt

Upmerged

 Make IDs read only
 --

 Key: YARN-528
 URL: https://issues.apache.org/jira/browse/YARN-528
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: YARN-528.txt, YARN-528.txt


 I really would like to rip out most if not all of the abstraction layer that 
 sits in-between Protocol Buffers, the RPC, and the actual user code.  We have 
 no plans to support any other serialization type, and the abstraction layer 
 just, makes it more difficult to change protocols, makes changing them more 
 error prone, and slows down the objects themselves.  
 Completely doing that is a lot of work.  This JIRA is a first step towards 
 that.  It makes the various ID objects immutable.  If this patch is wel 
 received I will try to go through other objects/classes of objects and update 
 them in a similar way.
 This is probably the last time we will be able to make a change like this 
 before 2.0 stabilizes and YARN APIs will not be able to be changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits


[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619957#comment-13619957
 ] 

Zhijie Shen commented on YARN-193:
--

{quote}
I am not sure if we should allow disabling of the max memory and max vcores 
setting. Was it supported earlier or does the patch introduce this support?
{quote}

Yes, the patch introduces the support. It is already there in your previous 
patch. I inherit it and and some description in yarn-default.xml. I'm fine with 
whether the function need to be supported or not. One risk I can image if the 
function is supported is that AM memory can exceeds 
yarn.nodemanager.resource.memory-mb when DISABLE_RESOURCELIMIT_CHECK is set. 
Then, the problem described in YARN-389 will occur.

{quote}
Question - should normalization of resource requests be done inside the 
scheduler or in the ApplicationMasterService itself which handles the allocate 
call?
{quote}
I think it should be better to do normalization outside allocate, because 
allocate is not only called in ApplicationMasterService and it is not necessary 
that normalize is called every time when allocate is called. For example, 
RMAppAttemptImpl#ScheduleTransition#transition doesn't require to do 
normalization because the resource has been validated during the submission 
stage. For another example, 
RMAppAttemptImpl#AMContainerAllocatedTransition#transition supplies an empty 
ask. 

{quote}
Unrelated to this patch but when throwing/logging errors related to configs, we 
should always point to the configuration property to let the user know which 
property needs to be changed. Please file a separate jira for the above.
{quote}
I'll do that, and modify the log information when exception is thrown in this 
patch.

{quote}
For InvalidResourceRequestException, missing javadocs for class description.
{quote}
I'll add the description.

{quote}
If maxMemory or maxVcores is set to -1, what will happen when normalize() is 
called?
{quote}
The normalized value has not upper bound.


 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, 
 YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, 
 YARN-193.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619989#comment-13619989
]

Hadoop QA commented on YARN-528:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12576592/YARN-528.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 50 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient

hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/646//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/646//console

This message is automatically generated.

Make IDs read only
--

Key: YARN-528
URL: https://issues.apache.org/jira/browse/YARN-528
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Attachments: YARN-528.txt, YARN-528.txt

[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality


[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620008#comment-13620008
 ] 

Bikas Saha commented on YARN-392:
-

Yes YARN-398 but not the proposal currently in there. The alternative proposal 
is to have a new method in AM RM protocol using which the AM can blacklist 
nodes globally for all tasks (at all priorities) for that app.

 Make it possible to schedule to specific nodes without dropping locality
 

 Key: YARN-392
 URL: https://issues.apache.org/jira/browse/YARN-392
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
 Attachments: YARN-392-1.patch, YARN-392.patch


 Currently its not possible to specify scheduling requests for specific nodes 
 and nowhere else. The RM automatically relaxes locality to rack and * and 
 assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-122) CompositeService should clone the Configurations it passes to children


 [ 
https://issues.apache.org/jira/browse/YARN-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-122:


Priority: Minor  (was: Major)

 CompositeService should clone the Configurations it passes to children
 --

 Key: YARN-122
 URL: https://issues.apache.org/jira/browse/YARN-122
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Priority: Minor
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 {{CompositeService.init(Configuration)}} saves the configuration passed in 
 *and* passes the same instance down to all managed services. This means a 
 change in the configuration of one child could propagate to all the others.
 Unless this is desired, the configuration should be cloned for each child.
 Fast and easy fix; tests can be added to those coming in MAPREDUCE-4014

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded


 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-529:
-

Summary: MR app master clean staging dir when reboot command sent from RM 
while the MR job succeeded  (was: IF RM rebooted when MR job succeeded )

 MR app master clean staging dir when reboot command sent from RM while the MR 
 job succeeded
 ---

 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: jian he



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded


 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-529:
-

Description: MR app master will clean staging dir, if the job is already 
succeeded and asked to reboot. RM will consider this job unsuccessful and 
launch further attempts, further attempts will fail because staging dir is 
cleaned

 MR app master clean staging dir when reboot command sent from RM while the MR 
 job succeeded
 ---

 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: jian he

 MR app master will clean staging dir, if the job is already succeeded and 
 asked to reboot. RM will consider this job unsuccessful and launch further 
 attempts, further attempts will fail because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded


 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he reassigned YARN-529:


Assignee: jian he

 MR app master clean staging dir when reboot command sent from RM while the MR 
 job succeeded
 ---

 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: jian he
Assignee: jian he

 MR app master will clean staging dir, if the job is already succeeded and 
 asked to reboot. RM will consider this job unsuccessful and launch further 
 attempts, further attempts will fail because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

Steve Loughran created YARN-530:
---

 Summary: Define Service model strictly, implement AbstractService 
for robust subclassing, migrate yarn-common services
 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran


# Extend the YARN {{Service}} interface as discussed in YARN-117
# Implement the changes in {{AbstractService}} and {{FilterService}}.
# Migrate all services in yarn-common to the more robust service model, test.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services


 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned YARN-530:
---

Assignee: Steve Loughran

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran

 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-121) Yarn services to throw a YarnException on invalid state changs


 [ 
https://issues.apache.org/jira/browse/YARN-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-121.
-

   Resolution: Duplicate
Fix Version/s: 3.0.0

Superceded by YARN-530

 Yarn services to throw a YarnException on invalid state changs
 --

 Key: YARN-121
 URL: https://issues.apache.org/jira/browse/YARN-121
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 3.0.0

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 the {{EnsureCurrentState()}} checks of services throw an 
 {{IllegalStateException}}  if the state is wrong. If this was changed to 
 {{YarnException}}. wrapper services such as CompositeService could relay this 
 direct, instead of wrapping it in their own.
 Time to implement mainly in changing the lifecycle test cases of 
 MAPREDUCE-3939 subtasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-120) Make yarn-common services robust


 [ 
https://issues.apache.org/jira/browse/YARN-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-120.
-

   Resolution: Duplicate
Fix Version/s: 3.0.0

Superceded by YARN-530

 Make yarn-common services robust
 

 Key: YARN-120
 URL: https://issues.apache.org/jira/browse/YARN-120
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
  Labels: yarn
 Fix For: 3.0.0

 Attachments: MAPREDUCE-4014.patch


 Review the yarn common services ({{CompositeService}}, 
 {{AbstractLivelinessMonitor}} and make their service startup _and especially 
 shutdown_ more robust against out-of-lifecycle invocation and partially 
 complete initialization.
 Write tests for these where possible. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities


[ 
https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620057#comment-13620057
 ] 

Bikas Saha commented on YARN-382:
-

+1 looks good to me.

 SchedulerUtils improve way normalizeRequest sets the resource capabilities
 --

 Key: YARN-382
 URL: https://issues.apache.org/jira/browse/YARN-382
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Zhijie Shen
 Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch


 In YARN-370, we changed it from setting the capability to directly setting 
 memory and cores:
 -ask.setCapability(normalized);
 +ask.getCapability().setMemory(normalized.getMemory());
 +ask.getCapability().setVirtualCores(normalized.getVirtualCores());
 We did this because it is directly setting the values in the original 
 resource object passed in when the AM gets allocated and without it the AM 
 doesn't get the resource normalized correctly in the submission context. See 
 YARN-370 for more details.
 I think we should find a better way of doing this long term, one so we don't 
 have to keep adding things there when new resources are added, two because 
 its a bit confusing as to what its doing and prone to someone accidentally 
 breaking it in the future again.  Something closer to what Arun suggested in 
 YARN-370 would be better but we need to make sure all the places work and get 
 some more testing on it before putting it in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services


 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-117changes.pdf

this is an overview of the changes, with explanations

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services


 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-530.patch

This is the subset of YARN-117 for yarn-common

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits


[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620066#comment-13620066
 ] 

Bikas Saha commented on YARN-193:
-

Can we check that we are getting the expected exception and not some other one?
{code}
+try {
+  rmService.submitApplication(submitRequest);
+  Assert.fail(Application submission should fail because);
+} catch (YarnRemoteException e) {
+  // Exception is expected
+}
+  }
{code}

Setting the same config twice? In second set, why not use a -ve value instead 
of the DISABLE value? Its not clear whether we want to disable check or set a 
-ve value. same for others.
{code}
+conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 0);
+conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
+ResourceCalculator.DISABLE_RESOURCELIMIT_CHECK);
+try {
+  resourceManager.init(conf);
+  fail(Exception is expected because the min memory allocation is +
+   non-positive.);
+} catch (YarnException e) {
+  // Exception is expected.
{code}

Lets also add a test for case when memory is more than max. Normalize should 
always reduce that to max. Same for DRF
{code}
+// max is not a multiple of min
+maxResource = Resources.createResource(maxMemory - 10, 0);
+ask.setCapability(Resources.createResource(maxMemory - 100));
+// multiple of minMemory  maxMemory, then reduce to maxMemory
+SchedulerUtils.normalizeRequest(ask, resourceCalculator, null,
+minResource, maxResource);
+assertEquals(maxResource.getMemory(), ask.getCapability().getMemory());
   }
{code}

Rename testAppSubmitError() to show that its testing invalid resource request?

TestAMRMClient. Why is this change needed?
{code}
+amResource.setMemory(
+YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
+amContainer.setResource(amResource);
{code}

Dont we need to throw?
{code}
+  } catch (InvalidResourceRequestException e) {
+LOG.info(Resource request was not able to be alloacated for +
+ application attempt  + appAttemptId +  because it +
+ failed to pass the validation.  + e.getMessage());
+RPCUtil.getRemoteException(e);
+  }
{code}

typo
{code}
+// validate scheduler vcors allocation setting
{code}

This will need to be rebased after YARN-382 which I am going to commit shortly.

I am fine with requiring that a max allocation limit be set. We should also 
make sure that max allocation from config can be matched by at least 1 machine 
in the cluster. That should be a different jira.

IMO, Normalization should be called only inside the scheduler. It is an 
artifact of the scheduler logic. Nothing in the RM requires resources to be 
normalized to a multiple of min. Only the scheduler needs it to makes its life 
easier and it could choose to not do so.



 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, 
 YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, 
 YARN-193.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-117) Enhance YARN service model


 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117.patch

This is the across-all-yarn-projects patch (plus  HADOOP-9447) just to show 
what the combined patch looks and tests like. YARN-530 contains the changes to 
yarn-common which should be the first step. (This patch contains those)

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.
 h2. Service listener failures not handled
 Is this an error an error or not? Log and ignore may not be what is desired.
 *Proposed:* during {{stop()}} any exception by a listener is caught and 
 discarded, to increase the likelihood of a better shutdown, but do not add 
 try-catch

[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits


[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620091#comment-13620091
 ] 

Bikas Saha commented on YARN-193:
-

Also, why are there so many normalize functions and why are we creating a new 
Resource object every time we normalize? We should fix this in a different jira 
though.

 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.4.patch, 
 YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, 
 YARN-193.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded


[ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620092#comment-13620092
 ] 

jian he commented on YARN-529:
--

several solutions:
1. Let RM accept old attempts. In current case, RM will raise exception because 
unrecognized attempts and think the job unsuccessful
2. Only clean staging dir after AM successfully unregister with RM. We can use 
a flag to indicate or modify state machine when receive JOB_AM_REBOOT, 
transition from SUCCEEDED to REBOOT. The potential problem is that, when job 
transition to SUCCEEDED state, some job success metrics stuff has already been 
triggered.

 MR app master clean staging dir when reboot command sent from RM while the MR 
 job succeeded
 ---

 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: jian he
Assignee: jian he

 MR app master will clean staging dir, if the job is already succeeded and 
 asked to reboot. RM will consider this job unsuccessful and launch further 
 attempts, further attempts will fail because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services


[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620096#comment-13620096
 ] 

Hadoop QA commented on YARN-530:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576617/YARN-530.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
33 warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/647//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/647//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/647//console

This message is automatically generated.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities


[ 
https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620097#comment-13620097
 ] 

Hudson commented on YARN-382:
-

Integrated in Hadoop-trunk-Commit #3549 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3549/])
YARN-382. SchedulerUtils improve way normalizeRequest sets the resource 
capabilities (Zhijie Shen via bikas) (Revision 1463653)

 Result = SUCCESS
bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463653
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java


 SchedulerUtils improve way normalizeRequest sets the resource capabilities
 --

 Key: YARN-382
 URL: https://issues.apache.org/jira/browse/YARN-382
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Zhijie Shen
 Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch


 In YARN-370, we changed it from setting the capability to directly setting 
 memory and cores:
 -ask.setCapability(normalized);
 +ask.getCapability().setMemory(normalized.getMemory());
 +ask.getCapability().setVirtualCores(normalized.getVirtualCores());
 We did this because it is directly setting the values in the original 
 resource object passed in when the AM gets allocated and without it the AM 
 doesn't get the resource normalized correctly in the submission context. See 
 YARN-370 for more details.
 I think we should find a better way of doing this long term, one so we don't 
 have to keep adding things there when new resources are added, two because 
 its a bit confusing as to what its doing and prone to someone accidentally 
 breaking it in the future again.  Something closer to what Arun suggested in 
 YARN-370 would be better but we need to make sure all the places work and get 
 some more testing on it before putting it in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-382) SchedulerUtils improve way normalizeRequest sets the resource capabilities


 [ 
https://issues.apache.org/jira/browse/YARN-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-382:
-

Fix Version/s: 2.0.5-beta

 SchedulerUtils improve way normalizeRequest sets the resource capabilities
 --

 Key: YARN-382
 URL: https://issues.apache.org/jira/browse/YARN-382
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves
Assignee: Zhijie Shen
 Fix For: 2.0.5-beta

 Attachments: YARN-382_1.patch, YARN-382_2.patch, YARN-382_demo.patch


 In YARN-370, we changed it from setting the capability to directly setting 
 memory and cores:
 -ask.setCapability(normalized);
 +ask.getCapability().setMemory(normalized.getMemory());
 +ask.getCapability().setVirtualCores(normalized.getVirtualCores());
 We did this because it is directly setting the values in the original 
 resource object passed in when the AM gets allocated and without it the AM 
 doesn't get the resource normalized correctly in the submission context. See 
 YARN-370 for more details.
 I think we should find a better way of doing this long term, one so we don't 
 have to keep adding things there when new resources are added, two because 
 its a bit confusing as to what its doing and prone to someone accidentally 
 breaking it in the future again.  Something closer to what Arun suggested in 
 YARN-370 would be better but we need to make sure all the places work and get 
 some more testing on it before putting it in. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-442) The ID classes should be immutable


 [ 
https://issues.apache.org/jira/browse/YARN-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-442.
--

Resolution: Duplicate
  Assignee: (was: Xuan Gong)

YARN-528 is fixing this, closing as duplicate.

 The ID classes should be immutable
 --

 Key: YARN-442
 URL: https://issues.apache.org/jira/browse/YARN-442
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth

 ApplicationId, ApplicationAttemptId, ContainerId should be immutable. That 
 should allow for a simpler implementation as well as remove synchronization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-117) Enhance YARN service model


[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620163#comment-13620163
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576620/YARN-117.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 28 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
33 warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup
  org.apache.hadoop.mapreduce.security.ssl.TestEncryptedShuffle
  org.apache.hadoop.mapred.TestNetworkedJob
  org.apache.hadoop.mapred.TestClusterMRNotification
  org.apache.hadoop.mapred.TestJobCounters
  org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner
  org.apache.hadoop.mapred.TestMiniMRClasspath
  org.apache.hadoop.mapred.TestBlockLimits
  org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
  org.apache.hadoop.mapred.TestMiniMRChildTask
  org.apache.hadoop.mapreduce.security.TestMRCredentials
  org.apache.hadoop.mapreduce.v2.TestNonExistentJob
  org.apache.hadoop.mapreduce.v2.TestRMNMInfo
  org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
  org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
  org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
  org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
  org.apache.hadoop.mapred.TestJobCleanup
  org.apache.hadoop.mapred.TestReduceFetch
  org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
  org.apache.hadoop.mapred.TestMerge
  org.apache.hadoop.mapreduce.v2.TestMRJobs
  org.apache.hadoop.mapreduce.TestChild
  org.apache.hadoop.mapred.TestJobName
  org.apache.hadoop.mapred.TestLazyOutput
  org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
  org.apache.hadoop.mapreduce.v2.TestUberAM
  org.apache.hadoop.mapred.TestMiniMRClientCluster
  org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
  org.apache.hadoop.mapred.TestClusterMapReduceTestCase
  org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
  org.apache.hadoop.ipc.TestSocketFactory
  org.apache.hadoop.mapred.TestJobSysDirWithDFS
  
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher
  
org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/648//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/648//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings:

[jira] [Commented] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620170#comment-13620170
]

Vinod Kumar Vavilapalli commented on YARN-528:
--

bq. We have no plans to support any other serialization type, and the
abstraction layer just, makes it more difficult to change protocols, makes
changing them more error prone, and slows down the objects themselves.
We have to make a call on this, don't think we explicitly took that decision
yet. That said, I am inclined to throw it away but there were a couple of
reasons why we put this (like being able to pass through unindentified fields
for e.g. from new RM to new NM via old AM). I would like a day or two to dig
into those with knowledgeable folks offline. Thanks for your patience.

Oh, and let's separate the tickets into MR and YARN only changes please - there
isn't any pain as they are all orthogonal changes.

Make IDs read only
--

Key: YARN-528
URL: https://issues.apache.org/jira/browse/YARN-528
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Attachments: YARN-528.txt, YARN-528.txt

[jira] [Commented] (YARN-529) MR app master clean staging dir when reboot command sent from RM while the MR job succeeded


[ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620230#comment-13620230
 ] 

Bikas Saha commented on YARN-529:
-

By 1) you mean let RM accept finishApplicationAttempt() from the last attempt?

 MR app master clean staging dir when reboot command sent from RM while the MR 
 job succeeded
 ---

 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 MR app master will clean staging dir, if the job is already succeeded and 
 asked to reboot. If the finishApplicationMaster call fails, RM will consider 
 this job unfinished and launch further attempts, further attempts will fail 
 because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails


 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-529:


Summary: Succeeded MR job is retried by RM if finishApplicationMaster() 
call fails  (was: Succeeded RM job is retried by RM if 
finishApplicationMaster() call fails)

 Succeeded MR job is retried by RM if finishApplicationMaster() call fails
 -

 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 MR app master will clean staging dir, if the job is already succeeded and 
 asked to reboot. If the finishApplicationMaster call fails, RM will consider 
 this job unfinished and launch further attempts, further attempts will fail 
 because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-529) Succeeded RM job is retried by RM if finishApplicationMaster() call fails


 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-529:


Summary: Succeeded RM job is retried by RM if finishApplicationMaster() 
call fails  (was: MR app master clean staging dir when reboot command sent from 
RM while the MR job succeeded)

 Succeeded RM job is retried by RM if finishApplicationMaster() call fails
 -

 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 MR app master will clean staging dir, if the job is already succeeded and 
 asked to reboot. If the finishApplicationMaster call fails, RM will consider 
 this job unfinished and launch further attempts, further attempts will fail 
 because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-529) Succeeded MR job is retried by RM if finishApplicationMaster() call fails


 [ 
https://issues.apache.org/jira/browse/YARN-529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-529:


Issue Type: Improvement  (was: Sub-task)
Parent: (was: YARN-128)

 Succeeded MR job is retried by RM if finishApplicationMaster() call fails
 -

 Key: YARN-529
 URL: https://issues.apache.org/jira/browse/YARN-529
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 MR app master will clean staging dir, if the job is already succeeded and 
 asked to reboot. If the finishApplicationMaster call fails, RM will consider 
 this job unfinished and launch further attempts, further attempts will fail 
 because staging dir is cleaned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-527) Local filecache mkdir fails


[ 
https://issues.apache.org/jira/browse/YARN-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620239#comment-13620239
 ] 

Vinod Kumar Vavilapalli commented on YARN-527:
--

Is there any difference in how NodeManager tried to create the dir and your 
manual creation? Like the user running NM and user who manually created the 
dir? Can you reproduce this? If we can find out exactly why NM couldn't create 
it automatically, then we can do something about it.

 Local filecache mkdir fails
 ---

 Key: YARN-527
 URL: https://issues.apache.org/jira/browse/YARN-527
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.0-alpha
 Environment: RHEL 6.3 with CDH4.1.3 Hadoop, HA with two name nodes 
 and six worker nodes.
Reporter: Knut O. Hellan
Priority: Minor
 Attachments: yarn-site.xml


 Jobs failed with no other explanation than this stack trace:
 2013-03-29 16:46:02,671 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diag
 nostics report from attempt_1364591875320_0017_m_00_0: 
 java.io.IOException: mkdir of /disk3/yarn/local/filecache/-42307893
 55400878397 failed
 at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:932)
 at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
 at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
 at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
 at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)
 at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Manually creating the directory worked. This behavior was common to at least 
 several nodes in the cluster.
 The situation was resolved by removing and recreating all 
 /disk?/yarn/local/filecache directories on all nodes.
 It is unclear whether Yarn struggled with the number of files or if there 
 were corrupt files in the caches. The situation was triggered by a node dying.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-02 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620261#comment-13620261
 ] 

Xuan Gong commented on YARN-101:


1.Use YarnServerBuilderUtils for constructing node-heartbeat response
2.User BuilderUtils to create ApplicationId, ContainerId, ContainerStatus, etc
3.Recreated the test case as last comment suggested

 If  the heartbeat message loss, the nodestatus info of complete container 
 will loss too.
 

 Key: YARN-101
 URL: https://issues.apache.org/jira/browse/YARN-101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: suse.
Reporter: xieguiming
Assignee: Xuan Gong
Priority: Minor
 Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
 YARN-101.4.patch, YARN-101.5.patch


 see the red color:
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
  protected void startStatusUpdater() {
 new Thread(Node Status Updater) {
   @Override
   @SuppressWarnings(unchecked)
   public void run() {
 int lastHeartBeatID = 0;
 while (!isStopped) {
   // Send heartbeat
   try {
 synchronized (heartbeatMonitor) {
   heartbeatMonitor.wait(heartBeatInterval);
 }
 {color:red} 
 // Before we send the heartbeat, we get the NodeStatus,
 // whose method removes completed containers.
 NodeStatus nodeStatus = getNodeStatus();
  {color}
 nodeStatus.setResponseId(lastHeartBeatID);
 
 NodeHeartbeatRequest request = recordFactory
 .newRecordInstance(NodeHeartbeatRequest.class);
 request.setNodeStatus(nodeStatus);   
 {color:red} 
// But if the nodeHeartbeat fails, we've already removed the 
 containers away to know about it. We aren't handling a nodeHeartbeat failure 
 case here.
 HeartbeatResponse response =
   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color} 
 if (response.getNodeAction() == NodeAction.SHUTDOWN) {
   LOG
   .info(Recieved SHUTDOWN signal from Resourcemanager as 
 part of heartbeat, +
hence shutting down.);
   NodeStatusUpdaterImpl.this.stop();
   break;
 }
 if (response.getNodeAction() == NodeAction.REBOOT) {
   LOG.info(Node is out of sync with ResourceManager,
   +  hence rebooting.);
   NodeStatusUpdaterImpl.this.reboot();
   break;
 }
 lastHeartBeatID = response.getResponseId();
 ListContainerId containersToCleanup = response
 .getContainersToCleanupList();
 if (containersToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedContainersEvent(containersToCleanup));
 }
 ListApplicationId appsToCleanup =
 response.getApplicationsToCleanupList();
 //Only start tracking for keepAlive on FINISH_APP
 trackAppsForKeepAlive(appsToCleanup);
 if (appsToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedAppsEvent(appsToCleanup));
 }
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, e);
   }
 }
   }
 }.start();
   }
   private NodeStatus getNodeStatus() {
 NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
 nodeStatus.setNodeId(this.nodeId);
 int numActiveContainers = 0;
 ListContainerStatus containersStatuses = new 
 ArrayListContainerStatus();
 for (IteratorEntryContainerId, Container i =
 this.context.getContainers().entrySet().iterator(); i.hasNext();) {
   EntryContainerId, Container e = i.next();
   ContainerId containerId = e.getKey();
   Container container = e.getValue();
   // Clone the container to send it to the RM
   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
   container.cloneAndGetContainerStatus();
   containersStatuses.add(containerStatus);
   ++numActiveContainers;
   LOG.info(Sending out status for container:  + containerStatus);
   {color:red} 
   // Here is the part that removes the completed containers.
   if (containerStatus.getState() == ContainerState.COMPLETE) {
 // Remove

[jira] [Assigned] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-02 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-486:
--

Assignee: Xuan Gong  (was: Bikas Saha)

 Change startContainer NM API to accept Container as a parameter and make 
 ContainerLaunchContext user land
 -

 Key: YARN-486
 URL: https://issues.apache.org/jira/browse/YARN-486
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong

 Currently, id, resource request etc need to be copied over from Container to 
 ContainerLaunchContext. This can be brittle. Also it leads to duplication of 
 information (such as Resource from CLC and Resource from Container and 
 Container.tokens). Sending Container directly to startContainer solves these 
 problems. It also makes CLC clean by only having stuff in it that it set by 
 the client/AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620275#comment-13620275
]

Siddharth Seth commented on YARN-528:
-

Yep, we'll likely only support a single serialization, which at this point is
PB.
What the current approach was supposed to be good at.
1. Handling unknown fields (which proto already supports), which could make
rolling upgrades etc easier.
2. Wrapping the object which came over the wire - with a goal of creating fewer
objects.

I don't think the second point was really achieved, with the implementation
getting complicated because of the interfaces being mutable, lists and
supporting chained sets (clc.getResource().setMemory()). I think point one
should continue to be maintained.

Do we want *Proto references in the APIs (client library versus Java Protocol
definition) . At the moment, these are only referenced in the PBImpls - and
hidden by the abstraction layer.

What I don't like about the patch is Protos leaking into the object
constructors. Instead, I think we could just use simple Java objects, with
conversion at the RPC layer (I believe this is similar to the HDFS model).
Unknown fields can be handled via byte[] arrays.
I'm guessing very few of the interfaces actually need to be mutable - so in
that sense, yes this needs to be done before beta. OTOH, changing the PBImpl
itself can be done at a later point if required. (Earlier is of-course better,
and I'd be happy to help with this. Was planning on working on YARN-442 before
you started this work).

Make IDs read only
--

[jira] [Updated] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats


 [ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-479:
-

Attachment: YARN-479.5.patch

 NM retry behavior for connection to RM should be similar for lost heartbeats
 

 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
 YARN-479.4.patch, YARN-479.5.patch


 Regardless of connection loss at the start or at an intermediate point, NM's 
 retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.


[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620284#comment-13620284
 ] 

Hadoop QA commented on YARN-101:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576650/YARN-101.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/649//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/649//console

This message is automatically generated.

 If  the heartbeat message loss, the nodestatus info of complete container 
 will loss too.
 

 Key: YARN-101
 URL: https://issues.apache.org/jira/browse/YARN-101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: suse.
Reporter: xieguiming
Assignee: Xuan Gong
Priority: Minor
 Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
 YARN-101.4.patch, YARN-101.5.patch


 see the red color:
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
  protected void startStatusUpdater() {
 new Thread(Node Status Updater) {
   @Override
   @SuppressWarnings(unchecked)
   public void run() {
 int lastHeartBeatID = 0;
 while (!isStopped) {
   // Send heartbeat
   try {
 synchronized (heartbeatMonitor) {
   heartbeatMonitor.wait(heartBeatInterval);
 }
 {color:red} 
 // Before we send the heartbeat, we get the NodeStatus,
 // whose method removes completed containers.
 NodeStatus nodeStatus = getNodeStatus();
  {color}
 nodeStatus.setResponseId(lastHeartBeatID);
 
 NodeHeartbeatRequest request = recordFactory
 .newRecordInstance(NodeHeartbeatRequest.class);
 request.setNodeStatus(nodeStatus);   
 {color:red} 
// But if the nodeHeartbeat fails, we've already removed the 
 containers away to know about it. We aren't handling a nodeHeartbeat failure 
 case here.
 HeartbeatResponse response =
   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color} 
 if (response.getNodeAction() == NodeAction.SHUTDOWN) {
   LOG
   .info(Recieved SHUTDOWN signal from Resourcemanager as 
 part of heartbeat, +
hence shutting down.);
   NodeStatusUpdaterImpl.this.stop();
   break;
 }
 if (response.getNodeAction() == NodeAction.REBOOT) {
   LOG.info(Node is out of sync with ResourceManager,
   +  hence rebooting.);
   NodeStatusUpdaterImpl.this.reboot();
   break;
 }
 lastHeartBeatID = response.getResponseId();
 ListContainerId containersToCleanup = response
 .getContainersToCleanupList();
 if (containersToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedContainersEvent(containersToCleanup));
 }
 ListApplicationId appsToCleanup =
 response.getApplicationsToCleanupList();
 //Only start tracking for keepAlive on FINISH_APP
 trackAppsForKeepAlive(appsToCleanup);
 if (appsToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedAppsEvent(appsToCleanup));
 }
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.

[jira] [Commented] (YARN-479) NM retry behavior for connection to RM should be similar for lost heartbeats


[ 
https://issues.apache.org/jira/browse/YARN-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620291#comment-13620291
 ] 

Hadoop QA commented on YARN-479:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576654/YARN-479.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/650//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/650//console

This message is automatically generated.

 NM retry behavior for connection to RM should be similar for lost heartbeats
 

 Key: YARN-479
 URL: https://issues.apache.org/jira/browse/YARN-479
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-479.1.patch, YARN-479.2.patch, YARN-479.3.patch, 
 YARN-479.4.patch, YARN-479.5.patch


 Regardless of connection loss at the start or at an intermediate point, NM's 
 retry behavior to the RM should follow the same flow. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620326#comment-13620326
]

Robert Joseph Evans commented on YARN-528:
--

I am fine with splitting the MR changes from the YARN change like I said, I put
this out here more to be a question of how do we want to go about implementing
theses changes, and the test was more of a prototype example.

I personally lean more towards using the *Proto classes directly. Why have
something else wrapping it if we don't need it, even if it is a small and
simple layer. The only reason I did not go that route here is because of
toString(). With the IDs we rely on having ID.toString() turn into something
very specific that can be parsed and turned back into an instance of the
object. If I had the time I would trace down all places where we call toString
on them and replace it with something else. I may just scale back the scope of
the patch to look at ApplicationID to begin with and try to see if I can
accomplish this.

bq. Wrapping the object which came over the wire - with a goal of creating
fewer objects.

I really don't understand how this is supposed to work. How do we create fewer
objects by wrapping them in more objects? I can see us doing something like
deduping the objects that come over the wire, but I don't see how wrapping
works here.

Make IDs read only
--

[jira] [Created] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable

Siddharth Seth created YARN-532:
---

 Summary: RMAdminProtocolPBClientImpl should implement Closeable
 Key: YARN-532
 URL: https://issues.apache.org/jira/browse/YARN-532
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth


Required for RPC.stopProxy to work. Already done in most of the other 
protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable


 [ 
https://issues.apache.org/jira/browse/YARN-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-532:


Attachment: YARN-532.txt

Trivial fix.

 RMAdminProtocolPBClientImpl should implement Closeable
 --

 Key: YARN-532
 URL: https://issues.apache.org/jira/browse/YARN-532
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: YARN-532.txt


 Required for RPC.stopProxy to work. Already done in most of the other 
 protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-528) Make IDs read only

[
https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620348#comment-13620348
]

Siddharth Seth commented on YARN-528:
-

bq. I really don't understand how this is supposed to work. How do we create
fewer objects by wrapping them in more objects? I can see us doing something
like deduping the objects that come over the wire, but I don't see how wrapping
works here.
Not compared to using Protos directly (which wasn't really an option), but
compared to an alternate of converting only for the RPC layer.

Make IDs read only
--

[jira] [Commented] (YARN-532) RMAdminProtocolPBClientImpl should implement Closeable


[ 
https://issues.apache.org/jira/browse/YARN-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620362#comment-13620362
 ] 

Hadoop QA commented on YARN-532:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576674/YARN-532.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/651//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/651//console

This message is automatically generated.

 RMAdminProtocolPBClientImpl should implement Closeable
 --

 Key: YARN-532
 URL: https://issues.apache.org/jira/browse/YARN-532
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: YARN-532.txt


 Required for RPC.stopProxy to work. Already done in most of the other 
 protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits


 [ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-193:
-

Attachment: YARN-193.12.patch

1. Remove the DISABLE_RESOURCELIMIT_CHECK feature, and its related test cases.

2. Rewrite the log messages, and output them through LOG.warn.

3. Add javadocs for InvalidResourceRequestException.

4. Check whether thrown exception is InvalidResourceRequestException in 
TestClientRMService.

5. Add the test case of ask  max in TestSchedulerUtils.

6. Fixed other minor issues commented by Bikas and Hitesh (e.g., typo, 
unnecessary import).

7. Rebase with YARN-382.


 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
 YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, 
 YARN-193.8.patch, YARN-193.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-467:
---

Attachment: yarn-467-20130402.patch

Fixing below issues
1) all the formatting issues 
2) adding one additional test case for checking Directory state transition from 
FULL-NON_FULL-FULL
3) javadoc warnings

 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 we need to have a mechanism where in we can create directory hierarchy and 
 limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620384#comment-13620384
 ] 

Hadoop QA commented on YARN-467:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576681/yarn-467-20130402.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/652//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/652//console

This message is automatically generated.

 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 we need to have a mechanism where in we can create directory hierarchy and 
 limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits


[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620385#comment-13620385
 ] 

Hadoop QA commented on YARN-193:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576680/YARN-193.12.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/653//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/653//console

This message is automatically generated.

 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
 YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, 
 YARN-193.8.patch, YARN-193.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-467:
---

Attachment: yarn-467-20130402.1.patch

fixing test issue... that check is no longer valid.

 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 we need to have a mechanism where in we can create directory hierarchy and 
 limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620412#comment-13620412
 ] 

Hadoop QA commented on YARN-467:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576688/yarn-467-20130402.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/654//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/654//console

This message is automatically generated.

 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 we need to have a mechanism where in we can create directory hierarchy and 
 limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-533) Pointing to the config property when throwing/logging the config-related exception

Zhijie Shen created YARN-533:


 Summary: Pointing to the config property when throwing/logging the 
config-related exception
 Key: YARN-533
 URL: https://issues.apache.org/jira/browse/YARN-533
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


When throwing/logging errors related to configiguration, we should always point 
to the configuration property to let users know which property needs to be 
changed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-495) Containers are not terminated when the NM is rebooted


 [ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-495:
-

Attachment: YARN-495.2.patch

 Containers are not terminated when the NM is rebooted
 -

 Key: YARN-495
 URL: https://issues.apache.org/jira/browse/YARN-495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-495.1.patch, YARN-495.2.patch


 When a reboot command is sent from RM, the node manager doesn't clean up the 
 containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted


[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620443#comment-13620443
 ] 

Jian He commented on YARN-495:
--

Uploaded a patch, change NM behavior from REBOOT to RESYNC when the RM restarted

 Containers are not terminated when the NM is rebooted
 -

 Key: YARN-495
 URL: https://issues.apache.org/jira/browse/YARN-495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-495.1.patch, YARN-495.2.patch


 When a reboot command is sent from RM, the node manager doesn't clean up the 
 containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-495) Containers are not terminated when the NM is rebooted


[ 
https://issues.apache.org/jira/browse/YARN-495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620448#comment-13620448
 ] 

Hadoop QA commented on YARN-495:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576695/YARN-495.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/655//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/655//console

This message is automatically generated.

 Containers are not terminated when the NM is rebooted
 -

 Key: YARN-495
 URL: https://issues.apache.org/jira/browse/YARN-495
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-495.1.patch, YARN-495.2.patch


 When a reboot command is sent from RM, the node manager doesn't clean up the 
 containers while its stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-534) AM max attempts is not checked when RM restart and try to recover attempts

Jian He created YARN-534:


 Summary: AM max attempts is not checked when RM restart and try to 
recover attempts
 Key: YARN-534
 URL: https://issues.apache.org/jira/browse/YARN-534
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


Currently,AM max attempts is only checked if the current attempt fails and 
check to see whether to create new attempt. If the RM restarts before the 
max-attempt fails, it'll not clean the state store, when RM comes back, it will 
retry attempt again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-458) Resource manager address must be placed in four different configs


 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Attachment: YARN-458.patch

 Resource manager address must be placed in four different configs
 -

 Key: YARN-458
 URL: https://issues.apache.org/jira/browse/YARN-458
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-458.patch


 The YARN resourcemanager's address is included in four different configs: 
 yarn.resourcemanager.scheduler.address, 
 yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
 and yarn.resourcemanager.admin.address
 A new user trying to configure a cluster needs to know the names of all these 
 four configs.
 It would be much easier if they could simply specify 
 yarn.resourcemanager.address and default ports for the other ones would kick 
 in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-458) Resource manager address must be placed in four different configs


[ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620489#comment-13620489
 ] 

Sandy Ryza commented on YARN-458:
-

Uploaded a patch that adds yarn.resourcemanager.hostname and 
yarn.nodemanager.hostname properties, and changes all the other configs to use 
${yarn.resourcemanager.address} and ${yarn.nodemanager.address).



 Resource manager address must be placed in four different configs
 -

 Key: YARN-458
 URL: https://issues.apache.org/jira/browse/YARN-458
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-458.patch


 The YARN resourcemanager's address is included in four different configs: 
 yarn.resourcemanager.scheduler.address, 
 yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
 and yarn.resourcemanager.admin.address
 A new user trying to configure a cluster needs to know the names of all these 
 four configs.
 It would be much easier if they could simply specify 
 yarn.resourcemanager.address and default ports for the other ones would kick 
 in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs


 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Summary: YARN daemon addresses must be placed in many different configs  
(was: Resource manager address must be placed in four different configs)

 YARN daemon addresses must be placed in many different configs
 --

 Key: YARN-458
 URL: https://issues.apache.org/jira/browse/YARN-458
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-458.patch


 The YARN resourcemanager's address is included in four different configs: 
 yarn.resourcemanager.scheduler.address, 
 yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
 and yarn.resourcemanager.admin.address
 A new user trying to configure a cluster needs to know the names of all these 
 four configs.
 It would be much easier if they could simply specify 
 yarn.resourcemanager.address and default ports for the other ones would kick 
 in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs


 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Description: 
The YARN resourcemanager's address is included in four different configs: 
yarn.resourcemanager.scheduler.address, 
yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
and yarn.resourcemanager.admin.address

A new user trying to configure a cluster needs to know the names of all these 
four configs.

The same issue exists for nodemanagers.

It would be much easier if they could simply specify 
yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
for the other ones would kick in.

  was:
The YARN resourcemanager's address is included in four different configs: 
yarn.resourcemanager.scheduler.address, 
yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
and yarn.resourcemanager.admin.address

A new user trying to configure a cluster needs to know the names of all these 
four configs.

It would be much easier if they could simply specify 
yarn.resourcemanager.address and default ports for the other ones would kick in.


 YARN daemon addresses must be placed in many different configs
 --

 Key: YARN-458
 URL: https://issues.apache.org/jira/browse/YARN-458
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-458.patch


 The YARN resourcemanager's address is included in four different configs: 
 yarn.resourcemanager.scheduler.address, 
 yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
 and yarn.resourcemanager.admin.address
 A new user trying to configure a cluster needs to know the names of all these 
 four configs.
 The same issue exists for nodemanagers.
 It would be much easier if they could simply specify 
 yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
 for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs


 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Affects Version/s: 2.0.3-alpha

 YARN daemon addresses must be placed in many different configs
 --

 Key: YARN-458
 URL: https://issues.apache.org/jira/browse/YARN-458
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-458.patch


 The YARN resourcemanager's address is included in four different configs: 
 yarn.resourcemanager.scheduler.address, 
 yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
 and yarn.resourcemanager.admin.address
 A new user trying to configure a cluster needs to know the names of all these 
 four configs.
 The same issue exists for nodemanagers.
 It would be much easier if they could simply specify 
 yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
 for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-458) YARN daemon addresses must be placed in many different configs


 [ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-458:


Component/s: resourcemanager
 nodemanager

 YARN daemon addresses must be placed in many different configs
 --

 Key: YARN-458
 URL: https://issues.apache.org/jira/browse/YARN-458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-458.patch


 The YARN resourcemanager's address is included in four different configs: 
 yarn.resourcemanager.scheduler.address, 
 yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
 and yarn.resourcemanager.admin.address
 A new user trying to configure a cluster needs to know the names of all these 
 four configs.
 The same issue exists for nodemanagers.
 It would be much easier if they could simply specify 
 yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
 for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-458) YARN daemon addresses must be placed in many different configs


[ 
https://issues.apache.org/jira/browse/YARN-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620501#comment-13620501
 ] 

Hadoop QA commented on YARN-458:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576699/YARN-458.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/656//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/656//console

This message is automatically generated.

 YARN daemon addresses must be placed in many different configs
 --

 Key: YARN-458
 URL: https://issues.apache.org/jira/browse/YARN-458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-458.patch


 The YARN resourcemanager's address is included in four different configs: 
 yarn.resourcemanager.scheduler.address, 
 yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, 
 and yarn.resourcemanager.admin.address
 A new user trying to configure a cluster needs to know the names of all these 
 four configs.
 The same issue exists for nodemanagers.
 It would be much easier if they could simply specify 
 yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports 
 for the other ones would kick in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620535#comment-13620535
 ] 

Omkar Vinit Joshi commented on YARN-467:


I have tested this code for below scenarios
* I used 4 local-dirs to see if the localization gets distributed across them 
and LocalCacheDirectoryManager 
is managing them separately
* I tested for various values of 
yarn.nodemanager.local-cache.max-files-per-directory =36, 37 , 40 and much 
larger..
* I modified the cache cleanup interval and cache target size in mb to see 
older files getting removed from cache and LocalCacheDirectoryManager's sub 
directories are getting reused.
* I tested that we never run into a situation where we have more number of 
files or sub directories in any local-directory than what is specified in the 
configuration.

 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.1.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 we need to have a mechanism where in we can create directory hierarchy and 
 limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


 [ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-467:
---

Attachment: yarn-467-20130402.2.patch

 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
 yarn-467-20130402.2.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 we need to have a mechanism where in we can create directory hierarchy and 
 limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620546#comment-13620546
 ] 

Hadoop QA commented on YARN-467:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12576705/yarn-467-20130402.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/657//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/657//console

This message is automatically generated.

 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
 yarn-467-20130402.2.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 we need to have a mechanism where in we can create directory hierarchy and 
 limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.

2013-04-02 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-101:
---

Attachment: YARN-101.6.patch

recreate test case to verify status of all containers in every heartbeat

 If  the heartbeat message loss, the nodestatus info of complete container 
 will loss too.
 

 Key: YARN-101
 URL: https://issues.apache.org/jira/browse/YARN-101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: suse.
Reporter: xieguiming
Assignee: Xuan Gong
Priority: Minor
 Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
 YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch


 see the red color:
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
  protected void startStatusUpdater() {
 new Thread(Node Status Updater) {
   @Override
   @SuppressWarnings(unchecked)
   public void run() {
 int lastHeartBeatID = 0;
 while (!isStopped) {
   // Send heartbeat
   try {
 synchronized (heartbeatMonitor) {
   heartbeatMonitor.wait(heartBeatInterval);
 }
 {color:red} 
 // Before we send the heartbeat, we get the NodeStatus,
 // whose method removes completed containers.
 NodeStatus nodeStatus = getNodeStatus();
  {color}
 nodeStatus.setResponseId(lastHeartBeatID);
 
 NodeHeartbeatRequest request = recordFactory
 .newRecordInstance(NodeHeartbeatRequest.class);
 request.setNodeStatus(nodeStatus);   
 {color:red} 
// But if the nodeHeartbeat fails, we've already removed the 
 containers away to know about it. We aren't handling a nodeHeartbeat failure 
 case here.
 HeartbeatResponse response =
   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color} 
 if (response.getNodeAction() == NodeAction.SHUTDOWN) {
   LOG
   .info(Recieved SHUTDOWN signal from Resourcemanager as 
 part of heartbeat, +
hence shutting down.);
   NodeStatusUpdaterImpl.this.stop();
   break;
 }
 if (response.getNodeAction() == NodeAction.REBOOT) {
   LOG.info(Node is out of sync with ResourceManager,
   +  hence rebooting.);
   NodeStatusUpdaterImpl.this.reboot();
   break;
 }
 lastHeartBeatID = response.getResponseId();
 ListContainerId containersToCleanup = response
 .getContainersToCleanupList();
 if (containersToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedContainersEvent(containersToCleanup));
 }
 ListApplicationId appsToCleanup =
 response.getApplicationsToCleanupList();
 //Only start tracking for keepAlive on FINISH_APP
 trackAppsForKeepAlive(appsToCleanup);
 if (appsToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedAppsEvent(appsToCleanup));
 }
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM still running.
 LOG.error(Caught exception in status-updater, e);
   }
 }
   }
 }.start();
   }
   private NodeStatus getNodeStatus() {
 NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class);
 nodeStatus.setNodeId(this.nodeId);
 int numActiveContainers = 0;
 ListContainerStatus containersStatuses = new 
 ArrayListContainerStatus();
 for (IteratorEntryContainerId, Container i =
 this.context.getContainers().entrySet().iterator(); i.hasNext();) {
   EntryContainerId, Container e = i.next();
   ContainerId containerId = e.getKey();
   Container container = e.getValue();
   // Clone the container to send it to the RM
   org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = 
   container.cloneAndGetContainerStatus();
   containersStatuses.add(containerStatus);
   ++numActiveContainers;
   LOG.info(Sending out status for container:  + containerStatus);
   {color:red} 
   // Here is the part that removes the completed containers.
   if (containerStatus.getState() == ContainerState.COMPLETE) {
 // Remove
 i.remove();
   {color} 
 LOG.info(Removed completed container  + containerId);
   }
 }

[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620608#comment-13620608
 ] 

Vinod Kumar Vavilapalli commented on YARN-467:
--

Perfect, the latest patch looks good. Checking it in.

 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
 yarn-467-20130402.2.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 we need to have a mechanism where in we can create directory hierarchy and 
 limit number of files per directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.


[ 
https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620610#comment-13620610
 ] 

Hadoop QA commented on YARN-101:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12576714/YARN-101.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/658//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/658//console

This message is automatically generated.

 If  the heartbeat message loss, the nodestatus info of complete container 
 will loss too.
 

 Key: YARN-101
 URL: https://issues.apache.org/jira/browse/YARN-101
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
 Environment: suse.
Reporter: xieguiming
Assignee: Xuan Gong
Priority: Minor
 Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, 
 YARN-101.4.patch, YARN-101.5.patch, YARN-101.6.patch


 see the red color:
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java
  protected void startStatusUpdater() {
 new Thread(Node Status Updater) {
   @Override
   @SuppressWarnings(unchecked)
   public void run() {
 int lastHeartBeatID = 0;
 while (!isStopped) {
   // Send heartbeat
   try {
 synchronized (heartbeatMonitor) {
   heartbeatMonitor.wait(heartBeatInterval);
 }
 {color:red} 
 // Before we send the heartbeat, we get the NodeStatus,
 // whose method removes completed containers.
 NodeStatus nodeStatus = getNodeStatus();
  {color}
 nodeStatus.setResponseId(lastHeartBeatID);
 
 NodeHeartbeatRequest request = recordFactory
 .newRecordInstance(NodeHeartbeatRequest.class);
 request.setNodeStatus(nodeStatus);   
 {color:red} 
// But if the nodeHeartbeat fails, we've already removed the 
 containers away to know about it. We aren't handling a nodeHeartbeat failure 
 case here.
 HeartbeatResponse response =
   resourceTracker.nodeHeartbeat(request).getHeartbeatResponse();
{color} 
 if (response.getNodeAction() == NodeAction.SHUTDOWN) {
   LOG
   .info(Recieved SHUTDOWN signal from Resourcemanager as 
 part of heartbeat, +
hence shutting down.);
   NodeStatusUpdaterImpl.this.stop();
   break;
 }
 if (response.getNodeAction() == NodeAction.REBOOT) {
   LOG.info(Node is out of sync with ResourceManager,
   +  hence rebooting.);
   NodeStatusUpdaterImpl.this.reboot();
   break;
 }
 lastHeartBeatID = response.getResponseId();
 ListContainerId containersToCleanup = response
 .getContainersToCleanupList();
 if (containersToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedContainersEvent(containersToCleanup));
 }
 ListApplicationId appsToCleanup =
 response.getApplicationsToCleanupList();
 //Only start tracking for keepAlive on FINISH_APP
 trackAppsForKeepAlive(appsToCleanup);
 if (appsToCleanup.size() != 0) {
   dispatcher.getEventHandler().handle(
   new CMgrCompletedAppsEvent(appsToCleanup));
 }
   } catch (Throwable e) {
 // TODO Better error handling. Thread can die with the rest of the
 // NM

[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits


[ 
https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620617#comment-13620617
 ] 

Hudson commented on YARN-467:
-

Integrated in Hadoop-trunk-Commit #3552 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3552/])
YARN-467. Modify public distributed cache to localize files such that no 
local directory hits unix file count limits and thus prevent job failures. 
Contributed by Omkar Vinit Joshi. (Revision 1463823)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1463823
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheDirectoryManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java


 Jobs fail during resource localization when public distributed-cache hits 
 unix directory limits
 ---

 Key: YARN-467
 URL: https://issues.apache.org/jira/browse/YARN-467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.0.5-beta

 Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, 
 yarn-467-20130322.3.patch, yarn-467-20130322.patch, 
 yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch, 
 yarn-467-20130401.patch, yarn-467-20130402.1.patch, 
 yarn-467-20130402.2.patch, yarn-467-20130402.patch


 If we have multiple jobs which uses distributed cache with small size of 
 files, the directory limit reaches before reaching the cache size and fails 
 to create any directories in file cache (PUBLIC). The jobs start failing with 
 the below exception.
 java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 
 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
   at 
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886

[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits


[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620638#comment-13620638
 ] 

Bikas Saha commented on YARN-193:
-

Default value of max-vcores of 32 might be too high.

Why is conf being set 2 times for each value? Same for vcores.
{code}
+conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 2048);
+conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 1024);
+try {
+  resourceManager.init(conf);
+  fail(Exception is expected because the min memory allocation is +
+   larger than the max memory allocation.);
+} catch (YarnException e) {
+  // Exception is expected.
+}
{code}



 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.10.patch, YARN-193.11.patch, YARN-193.12.patch, 
 YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, 
 YARN-193.8.patch, YARN-193.9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits