date:20121023


[ 
https://issues.apache.org/jira/browse/YARN-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482251#comment-13482251
 ] 

Hudson commented on YARN-174:
-

Integrated in Hadoop-Yarn-trunk #12 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/12/])
YARN-174. Modify NodeManager to pass the user's configuration even when 
rebooting. Contributed by Vinod Kumar Vavilapalli. (Revision 1401086)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401086
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


 TestNodeStatusUpdater is failing in trunk
 -

 Key: YARN-174
 URL: https://issues.apache.org/jira/browse/YARN-174
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.0.2-alpha, 0.23.5

 Attachments: YARN-174-20121022.txt, YARN-174.patch


 {noformat}
 2012-10-19 12:18:23,941 FATAL [Node Status Updater] nodemanager.NodeManager 
 (NodeManager.java:initAndStartNodeManager(277)) - Error starting NodeManager
 org.apache.hadoop.yarn.YarnException: ${yarn.log.dir}/userlogs is not a valid 
 path. Path should be with file scheme or without scheme
 at 
 org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.validatePaths(LocalDirsHandlerService.java:321)
 at 
 org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.init(LocalDirsHandlerService.java:95)
 at 
 org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.init(LocalDirsHandlerService.java:123)
 at 
 org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.init(NodeHealthCheckerService.java:48)
 at 
 org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:165)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:274)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stateChanged(NodeManager.java:256)
 at 
 org.apache.hadoop.yarn.service.AbstractService.changeState(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.service.AbstractService.stop(AbstractService.java:112)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.stop(NodeStatusUpdaterImpl.java:149)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.reboot(NodeStatusUpdaterImpl.java:157)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$900(NodeStatusUpdaterImpl.java:63)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:357)
 {noformat}
 The NM then calls System.exit(-1), which makes the unit test exit and 
 produces an error that is hard to track down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-174) TestNodeStatusUpdater is failing in trunk


[ 
https://issues.apache.org/jira/browse/YARN-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482336#comment-13482336
 ] 

Hudson commented on YARN-174:
-

Integrated in Hadoop-Mapreduce-trunk #1234 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1234/])
YARN-174. Modify NodeManager to pass the user's configuration even when 
rebooting. Contributed by Vinod Kumar Vavilapalli. (Revision 1401086)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401086
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


 TestNodeStatusUpdater is failing in trunk
 -

 Key: YARN-174
 URL: https://issues.apache.org/jira/browse/YARN-174
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Robert Joseph Evans
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.0.2-alpha, 0.23.5

 Attachments: YARN-174-20121022.txt, YARN-174.patch


 {noformat}
 2012-10-19 12:18:23,941 FATAL [Node Status Updater] nodemanager.NodeManager 
 (NodeManager.java:initAndStartNodeManager(277)) - Error starting NodeManager
 org.apache.hadoop.yarn.YarnException: ${yarn.log.dir}/userlogs is not a valid 
 path. Path should be with file scheme or without scheme
 at 
 org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.validatePaths(LocalDirsHandlerService.java:321)
 at 
 org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.init(LocalDirsHandlerService.java:95)
 at 
 org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.init(LocalDirsHandlerService.java:123)
 at 
 org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.init(NodeHealthCheckerService.java:48)
 at 
 org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:165)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:274)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stateChanged(NodeManager.java:256)
 at 
 org.apache.hadoop.yarn.service.AbstractService.changeState(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.service.AbstractService.stop(AbstractService.java:112)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.stop(NodeStatusUpdaterImpl.java:149)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.reboot(NodeStatusUpdaterImpl.java:157)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$900(NodeStatusUpdaterImpl.java:63)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:357)
 {noformat}
 The NM then calls System.exit(-1), which makes the unit test exit and 
 produces an error that is hard to track down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-177) CapacityScheduler - adding a queue while the RM is running has wacky results


[ 
https://issues.apache.org/jira/browse/YARN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482353#comment-13482353
 ] 

Hadoop QA commented on YARN-177:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550459/YARN-177.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/117//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/117//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/117//console

This message is automatically generated.

 CapacityScheduler - adding a queue while the RM is running has wacky results
 

 Key: YARN-177
 URL: https://issues.apache.org/jira/browse/YARN-177
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.3
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: YARN-177.patch, YARN-177.patch


 Adding a queue to the capacity scheduler while the RM is running and then 
 running a job in the queue added results in very strange behavior.  The 
 cluster Total Memory can either decrease or increase.  We had a cluster where 
 total memory decreased to almost 1/6th the capacity. Running on a small test 
 cluster resulted in the capacity going up by simply adding a queue and 
 running wordcount.  
 Looking at the RM logs, used memory can go negative but other logs show the 
 number positive:
 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 
 used=memory: 7680 cluster=memory: 204800
 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 completedContainer queue=root usedCapacity=-0.0225 
 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-177) CapacityScheduler - adding a queue while the RM is running has wacky results

2012-10-23 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482397#comment-13482397
 ] 

Thomas Graves commented on YARN-177:


The LeafQueue has a setParentQueue that is unused and can be removed now.

 CapacityScheduler - adding a queue while the RM is running has wacky results
 

 Key: YARN-177
 URL: https://issues.apache.org/jira/browse/YARN-177
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.3
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: YARN-177.patch, YARN-177.patch


 Adding a queue to the capacity scheduler while the RM is running and then 
 running a job in the queue added results in very strange behavior.  The 
 cluster Total Memory can either decrease or increase.  We had a cluster where 
 total memory decreased to almost 1/6th the capacity. Running on a small test 
 cluster resulted in the capacity going up by simply adding a queue and 
 running wordcount.  
 Looking at the RM logs, used memory can go negative but other logs show the 
 number positive:
 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 
 used=memory: 7680 cluster=memory: 204800
 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 completedContainer queue=root usedCapacity=-0.0225 
 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-178) Fix custom ProcessTree instance creation


[ 
https://issues.apache.org/jira/browse/YARN-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482491#comment-13482491
 ] 

Robert Joseph Evans commented on YARN-178:
--

Makes since.  Bikas, unless you have a strong objection I will check this in 
this afternoon.

 Fix custom ProcessTree instance creation
 

 Key: YARN-178
 URL: https://issues.apache.org/jira/browse/YARN-178
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.5
Reporter: Radim Kolar
Assignee: Radim Kolar
Priority: Critical
 Attachments: pstree-instance2.txt, pstree-instance.txt


 1. In current pluggable resourcecalculatorprocesstree is not passed root 
 process id to custom implementation making it unusable.
 2. pstree do not extend Configured as it should
 Added constructor with pid argument with testsuite. Also added test that 
 pstree is correctly configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-177) CapacityScheduler - adding a queue while the RM is running has wacky results


[ 
https://issues.apache.org/jira/browse/YARN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482518#comment-13482518
 ] 

Hadoop QA commented on YARN-177:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550499/YARN-177.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/118//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/118//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/118//console

This message is automatically generated.

 CapacityScheduler - adding a queue while the RM is running has wacky results
 

 Key: YARN-177
 URL: https://issues.apache.org/jira/browse/YARN-177
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.3
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: YARN-177.patch, YARN-177.patch, YARN-177.patch


 Adding a queue to the capacity scheduler while the RM is running and then 
 running a job in the queue added results in very strange behavior.  The 
 cluster Total Memory can either decrease or increase.  We had a cluster where 
 total memory decreased to almost 1/6th the capacity. Running on a small test 
 cluster resulted in the capacity going up by simply adding a queue and 
 running wordcount.  
 Looking at the RM logs, used memory can go negative but other logs show the 
 number positive:
 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 
 used=memory: 7680 cluster=memory: 204800
 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 completedContainer queue=root usedCapacity=-0.0225 
 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-177) CapacityScheduler - adding a queue while the RM is running has wacky results


[ 
https://issues.apache.org/jira/browse/YARN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482563#comment-13482563
 ] 

Hadoop QA commented on YARN-177:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550506/YARN-177.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/119//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/119//console

This message is automatically generated.

 CapacityScheduler - adding a queue while the RM is running has wacky results
 

 Key: YARN-177
 URL: https://issues.apache.org/jira/browse/YARN-177
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.3
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: YARN-177.patch, YARN-177.patch, YARN-177.patch, 
 YARN-177.patch


 Adding a queue to the capacity scheduler while the RM is running and then 
 running a job in the queue added results in very strange behavior.  The 
 cluster Total Memory can either decrease or increase.  We had a cluster where 
 total memory decreased to almost 1/6th the capacity. Running on a small test 
 cluster resulted in the capacity going up by simply adding a queue and 
 running wordcount.  
 Looking at the RM logs, used memory can go negative but other logs show the 
 number positive:
 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 
 used=memory: 7680 cluster=memory: 204800
 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 completedContainer queue=root usedCapacity=-0.0225 
 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-180) Capacity scheduler - containers that get reserved create container token to early

[
https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482589#comment-13482589
]

Robert Joseph Evans commented on YARN-180:
--

The patch looks mostly good. I am a bit confused by {code}if (containerToken ==
null) {
containerToken = null; // Try again later.
}
{code} inside the new createContainerToken method. It is a copy and paste from
before, but not needed any more.

Other then that it looks good. Since Arun is on a plane now I will upload a new
patch.

Capacity scheduler - containers that get reserved create container token to
early
-

Key: YARN-180
URL: https://issues.apache.org/jira/browse/YARN-180
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 0.23.3
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Critical
Fix For: 2.0.3-alpha, 0.23.5

Attachments: YARN-180.patch, YARN-180.patch

The capacity scheduler has the ability to 'reserve' containers.
Unfortunately before it decides that it goes to reserved rather then
assigned, the Container object is created which creates a container token
that expires in roughly 10 minutes by default.
This means that by the time the NM frees up enough space on that node for the
container to move to assigned the container token may have expired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-180) Capacity scheduler - containers that get reserved create container token to early


[ 
https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482591#comment-13482591
 ] 

Robert Joseph Evans commented on YARN-180:
--

Oh I noticed that the containerToken is never assigned anyways.  I will fix 
that too.

 Capacity scheduler - containers that get reserved create container token to 
 early
 -

 Key: YARN-180
 URL: https://issues.apache.org/jira/browse/YARN-180
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.3
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: YARN-180.patch, YARN-180.patch


 The capacity scheduler has the ability to 'reserve' containers.  
 Unfortunately before it decides that it goes to reserved rather then 
 assigned, the Container object is created which creates a container token 
 that expires in roughly 10 minutes by default.  
 This means that by the time the NM frees up enough space on that node for the 
 container to move to assigned the container token may have expired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-180) Capacity scheduler - containers that get reserved create container token to early


 [ 
https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated YARN-180:
-

Attachment: YARN-180.patch

 Capacity scheduler - containers that get reserved create container token to 
 early
 -

 Key: YARN-180
 URL: https://issues.apache.org/jira/browse/YARN-180
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.3
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: YARN-180.patch, YARN-180.patch, YARN-180.patch


 The capacity scheduler has the ability to 'reserve' containers.  
 Unfortunately before it decides that it goes to reserved rather then 
 assigned, the Container object is created which creates a container token 
 that expires in roughly 10 minutes by default.  
 This means that by the time the NM frees up enough space on that node for the 
 container to move to assigned the container token may have expired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-180) Capacity scheduler - containers that get reserved create container token to early


[ 
https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482611#comment-13482611
 ] 

Hadoop QA commented on YARN-180:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550518/YARN-180.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/121//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/121//console

This message is automatically generated.

 Capacity scheduler - containers that get reserved create container token to 
 early
 -

 Key: YARN-180
 URL: https://issues.apache.org/jira/browse/YARN-180
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 0.23.3
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.5

 Attachments: YARN-180.patch, YARN-180.patch, YARN-180.patch


 The capacity scheduler has the ability to 'reserve' containers.  
 Unfortunately before it decides that it goes to reserved rather then 
 assigned, the Container object is created which creates a container token 
 that expires in roughly 10 minutes by default.  
 This means that by the time the NM frees up enough space on that node for the 
 container to move to assigned the container token may have expired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-183) Clean up fair scheduler code

2012-10-23 Thread Sandy Ryza (JIRA)

Sandy Ryza created YARN-183:
---

 Summary: Clean up fair scheduler code
 Key: YARN-183
 URL: https://issues.apache.org/jira/browse/YARN-183
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Minor


The fair scheduler code has a bunch of minor stylistic issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-167) AM stuck in KILL_WAIT for days


[ 
https://issues.apache.org/jira/browse/YARN-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482716#comment-13482716
 ] 

Robert Joseph Evans commented on YARN-167:
--

I am rather nervous about back porting MAPREDUCE-3353.  It is a major feature 
that has a significant footprint and was not all that stable when it first went 
in.  I know that it has since stabilized but I am still nervous about such a 
large change. It seems like it would be simpler to handle the KILL events in 
the states that missed it.

 AM stuck in KILL_WAIT for days
 --

 Key: YARN-167
 URL: https://issues.apache.org/jira/browse/YARN-167
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.3
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers

[
https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482751#comment-13482751
]

Vinod Kumar Vavilapalli commented on YARN-147:
--

Got lost between YARN-3 and YARN-147 :)

Very nice to have patch, I am willing to get it in ASAP given the amount of
time it's been around.

Some comments below.
- yarn.nodemanager.linux-container-executor.cgroups.mount has different
defaults in code and in yarn-default.xml
- If the configs can be done away with (see below), ignore this comment. The
descriptions for all the new configs in yarn-default.xml heavily reference
code. We should simplify them to not address code and instead make them
understandable by users and cross reference other related parameters.
- {code}// Based on testing, ApplicationMaster executables don't terminate
until
// a little after the container appears to have finished. Therefore, we
// wait a short bit for the cgroup to become empty before deleting it.
{code}
Can you explain this? Is this sleep necessary. Depending on its importance,
we'll need to fix the following Id check, AMs don't always have ID equaling one.
- container-executor.c: If a mount-point is already mounted, mount gives a
EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g.
on NM restarts). We could unmount cgroup fs on shutdown but that isn't always
guaranteed.
- Please update if you have tested it on a secure setup with LCE enabled with
and without cgroups.

The following are already raised by others in some way, but I don't see them
fixed in the latest patch. Unless I am missing something:
- Not sure of the benefit of configurable
yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just
always mount to a path that it creates and owns? Similar comment for the
hierarchy-prefix.
- CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple
places - updateCgroup() and createCgroup(). In the later, if cgroups are
enabled, and we can't create the file, it is a critical error?

One overarching improvement worth pursing immediately, either now or in follow
up tickets:
- Make ResourcesHandler top level. I'd like to merge the ContainersMonitor
functionality with this so as to monitor/enforce memory limits also.
ContainersMinotor is top-level, we should make ResourcesHandler also top-level
so that other platforms don't need to create this type-hierarchy all over again
when they wish to implement some or all of this functionality.

Add support for CPU isolation/monitoring of containers
--

Key: YARN-147
URL: https://issues.apache.org/jira/browse/YARN-147
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Andrew Ferguson
Fix For: 2.0.3-alpha

Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch,
YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch

This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not
show the SUBMIT PATCH button.

[jira] [Updated] (YARN-181) capacity-scheduler.xml move breaks Eclipse import


 [ 
https://issues.apache.org/jira/browse/YARN-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-181:
-

Summary: capacity-scheduler.xml move breaks Eclipse import  (was: 
capacity-scheduler.cfg move breaks Eclipse import)

I am sure you meant capacity-scheduler.xml

 capacity-scheduler.xml move breaks Eclipse import
 -

 Key: YARN-181
 URL: https://issues.apache.org/jira/browse/YARN-181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical
 Attachments: YARN181_jenkins.txt, YARN181_postSvnMv.txt, 
 YARN181_svn_mv.sh


 Eclipse doesn't seem to handle testResources which resolve to an absolute 
 path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the 
 hadoop-yarn project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-184) Remove unnecessary locking in fair scheduler, and address findbugs.

2012-10-23 Thread Sandy Ryza (JIRA)

Sandy Ryza created YARN-184:
---

 Summary: Remove unnecessary locking in fair scheduler, and address 
findbugs.
 Key: YARN-184
 URL: https://issues.apache.org/jira/browse/YARN-184
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sandy Ryza


In YARN-12, locks were added to all fields of QueueManager to address findbugs. 
 In addition, findbugs exclusions were added in response to MAPREDUCE-4439, 
without a deep look at the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-181) capacity-scheduler.xml move breaks Eclipse import


[ 
https://issues.apache.org/jira/browse/YARN-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482775#comment-13482775
 ] 

Hudson commented on YARN-181:
-

Integrated in Hadoop-trunk-Commit #2919 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/2919/])
YARN-181. Fixed eclipse settings broken by capacity-scheduler.xml move via 
YARN-140. Contributed by Siddharth Seth. (Revision 1401504)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401504
Files : 
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


 capacity-scheduler.xml move breaks Eclipse import
 -

 Key: YARN-181
 URL: https://issues.apache.org/jira/browse/YARN-181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical
 Fix For: 2.0.3-alpha

 Attachments: YARN181_jenkins.txt, YARN181_postSvnMv.txt, 
 YARN181_svn_mv.sh


 Eclipse doesn't seem to handle testResources which resolve to an absolute 
 path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the 
 hadoop-yarn project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-167) AM stuck in KILL_WAIT for days

[
https://issues.apache.org/jira/browse/YARN-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482778#comment-13482778
]

Vinod Kumar Vavilapalli commented on YARN-167:
--

bq. Afterwards, the Task Attempt transitions from SUCCESS_CONTAINER_CLEANUP to
SUCCEEDED. In either of these states TA_KILL is ignored. So the Task stays in
KILL_WAIT and consequently the Job too.
This is fine. Job waits for all tasks and taskAttempts to 'finish', not just
killed. In this case, TA will succeed and inform the job about the same, so
that the job doesn't wait for this task anymore.

bq. I am rather nervous about back porting MAPREDUCE-3353. It is a major
feature that has a significant footprint and was not all that stable when it
first went in. I know that it has since stabilized but I am still nervous about
such a large change.
Understand that it is a big change, but if we want to address this issue, we
need that patch. Given MAPREDUCE-3353 is hardened on trunk, we should
considering pulling it in into 0.23.

bq. It seems like it would be simpler to handle the KILL events in the states
that missed it.
There isn't anything like a missed state that is causing this issue if I
understand Ravi's issue description correctly.

AM stuck in KILL_WAIT for days
--

Key: YARN-167
URL: https://issues.apache.org/jira/browse/YARN-167
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 0.23.3
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Attachments: TaskAttemptStateGraph.jpg

We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them
as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a
few maps running. All these maps were scheduled on nodes which are now in the
RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

[jira] [Updated] (YARN-182) Container killed by the ApplicationMaster


 [ 
https://issues.apache.org/jira/browse/YARN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-182:
-

Attachment: Log.txt

Attaching log.

 Container killed by the ApplicationMaster
 -

 Key: YARN-182
 URL: https://issues.apache.org/jira/browse/YARN-182
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.1-alpha
Reporter: zhengqiu cai
  Labels: hadoop
 Attachments: Log.txt


 I was running wordcount and the resourcemanager web UI shown the status as 
 FINISHED SUCCEEDED, but the log shown Container killed by the 
 ApplicationMaster

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-182) Unnecessary Container killed by the ApplicationMaster message for successful containers


 [ 
https://issues.apache.org/jira/browse/YARN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-182:
-

Summary: Unnecessary Container killed by the ApplicationMaster message 
for successful containers  (was: Container killed by the ApplicationMaster)

 Unnecessary Container killed by the ApplicationMaster message for 
 successful containers
 -

 Key: YARN-182
 URL: https://issues.apache.org/jira/browse/YARN-182
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.1-alpha
Reporter: zhengqiu cai
  Labels: hadoop
 Attachments: Log.txt


 I was running wordcount and the resourcemanager web UI shown the status as 
 FINISHED SUCCEEDED, but the log shown Container killed by the 
 ApplicationMaster

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-181) capacity-scheduler.xml move breaks Eclipse import


[ 
https://issues.apache.org/jira/browse/YARN-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482811#comment-13482811
 ] 

Hudson commented on YARN-181:
-

Integrated in Hadoop-Yarn-trunk #13 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/13/])
YARN-181. Fixed eclipse settings broken by capacity-scheduler.xml move via 
YARN-140. Contributed by Siddharth Seth. (Revision 1401504)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401504
Files : 
* 
/hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


 capacity-scheduler.xml move breaks Eclipse import
 -

 Key: YARN-181
 URL: https://issues.apache.org/jira/browse/YARN-181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical
 Fix For: 2.0.3-alpha

 Attachments: YARN181_jenkins.txt, YARN181_postSvnMv.txt, 
 YARN181_svn_mv.sh


 Eclipse doesn't seem to handle testResources which resolve to an absolute 
 path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the 
 hadoop-yarn project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-179) Bunch of test failures on trunk


[ 
https://issues.apache.org/jira/browse/YARN-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482814#comment-13482814
 ] 

Hudson commented on YARN-179:
-

Integrated in Hadoop-Yarn-trunk #13 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/13/])
YARN-179. Fix some unit test failures. (Contributed by Vinod Kumar 
Vavilapalli) (Revision 1401481)

 Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401481
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Bunch of test failures on trunk
 ---

 Key: YARN-179
 URL: https://issues.apache.org/jira/browse/YARN-179
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.2-alpha
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 2.0.3-alpha

 Attachments: YARN-179-20121022.3.txt, YARN-179-20121022.4.txt


 {{CapacityScheduler.setConf()}} mandates a YarnConfiguration. It doesn't need 
 to, throughout all of YARN, components only depend on Configuration and 
 depend on the callers to provide correct configuration.
 This is causing multiple tests to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-184) Remove unnecessary locking in fair scheduler, and address findbugs excludes.

2012-10-23 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-184:


Summary: Remove unnecessary locking in fair scheduler, and address findbugs 
excludes.  (was: Remove unnecessary locking in fair scheduler, and address 
findbugs.)

 Remove unnecessary locking in fair scheduler, and address findbugs excludes.
 

 Key: YARN-184
 URL: https://issues.apache.org/jira/browse/YARN-184
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 In YARN-12, locks were added to all fields of QueueManager to address 
 findbugs.  In addition, findbugs exclusions were added in response to 
 MAPREDUCE-4439, without a deep look at the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-167) AM stuck in KILL_WAIT for days


[ 
https://issues.apache.org/jira/browse/YARN-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482853#comment-13482853
 ] 

Vinod Kumar Vavilapalli commented on YARN-167:
--

bq. There isn't anything like a missed state that is causing this issue if I 
understand Ravi's issue description correctly.
Obviously, this could be wrong.

Ravi, if you have one of these stuck AMs lying around, can you take a thread 
dump please?

 AM stuck in KILL_WAIT for days
 --

 Key: YARN-167
 URL: https://issues.apache.org/jira/browse/YARN-167
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.3
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
 Attachments: TaskAttemptStateGraph.jpg


 We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them 
 as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a 
 few maps running. All these maps were scheduled on nodes which are now in the 
 RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-139) Interrupted Exception within AsyncDispatcher leads to user confusion


 [ 
https://issues.apache.org/jira/browse/YARN-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-139:
-

Attachment: YARN-139-20121023.txt

Thanks Jason, was thinking of fixing that separately but here it goes.

 Interrupted Exception within AsyncDispatcher leads to user confusion
 

 Key: YARN-139
 URL: https://issues.apache.org/jira/browse/YARN-139
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.0.2-alpha, 0.23.4
Reporter: Nathan Roberts
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-139-20121019.1.txt, YARN-139-20121019.txt, 
 YARN-139-20121023.txt, YARN-139.txt


 Successful applications tend to get InterruptedExceptions during shutdown. 
 The exception is harmless but it leads to lots of user confusion and 
 therefore could be cleaned up.
 2012-09-28 14:50:12,477 WARN [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while 
 stopping
 java.lang.InterruptedException
   at java.lang.Object.wait(Native Method)
   at java.lang.Thread.join(Thread.java:1143)
   at java.lang.Thread.join(Thread.java:1196)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105)
   at 
 org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
   at 
 org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:619)
 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.yarn.service.AbstractService: 
 Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped.
 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-139) Interrupted Exception within AsyncDispatcher leads to user confusion