date:20150107


[ 
https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267455#comment-14267455
 ] 

Hadoop QA commented on YARN-3006:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690518/YARN-3006.001.patch
  against trunk revision 788ee35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6267//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6267//console

This message is automatically generated.

 Improve the error message when attempting manual failover with auto-failover 
 enabled
 

 Key: YARN-3006
 URL: https://issues.apache.org/jira/browse/YARN-3006
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-3006.001.patch


 When executing manual failover with automatic failover enabled, 
 UnsupportedOperationException is thrown.
 {code}
 # yarn rmadmin -failover rm1 rm2
 Exception in thread main java.lang.UnsupportedOperationException: 
 RMHAServiceTarget doesn't have a corresponding ZKFC address
   at 
 org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51)
   at 
 org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94)
   at 
 org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311)
   at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622)
 {code}
 I'm thinking the above message is confusing to users. (Users may think 
 whether ZKFC is configured correctly...) The command should output error 
 message to stderr instead of throwing Exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance


[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267388#comment-14267388
 ] 

Hadoop QA commented on YARN-2996:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690501/YARN-2996.003.patch
  against trunk revision 788ee35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6266//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6266//console

This message is automatically generated.

 Refine some fs operations in FileSystemRMStateStore to improve performance
 --

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, 
 YARN-2996.003.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled

2015-01-07 Thread Akira AJISAKA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3006:

Attachment: YARN-3006.001.patch

Attaching a simple patch.

 Improve the error message when attempting manual failover with auto-failover 
 enabled
 

 Key: YARN-3006
 URL: https://issues.apache.org/jira/browse/YARN-3006
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: YARN-3006.001.patch


 When executing manual failover with automatic failover enabled, 
 UnsupportedOperationException is thrown.
 {code}
 # yarn rmadmin -failover rm1 rm2
 Exception in thread main java.lang.UnsupportedOperationException: 
 RMHAServiceTarget doesn't have a corresponding ZKFC address
   at 
 org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51)
   at 
 org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94)
   at 
 org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311)
   at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622)
 {code}
 I'm thinking the above message is confusing to users. (Users may think 
 whether ZKFC is configured correctly...) The command should output error 
 message to stderr instead of throwing Exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services


[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267545#comment-14267545
 ] 

Hudson commented on YARN-2427:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #800 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/800/])
YARN-2427. Added the API of moving apps between queues in RM web services. 
Contributed by Varun Vasudev. (zjshen: rev 
60103fca04dc713183e4ec9e12f961642e7d1001)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, 
 apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services


[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267663#comment-14267663
 ] 

Hudson commented on YARN-2427:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1998 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1998/])
YARN-2427. Added the API of moving apps between queues in RM web services. 
Contributed by Varun Vasudev. (zjshen: rev 
60103fca04dc713183e4ec9e12f961642e7d1001)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java


 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, 
 apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info


[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267664#comment-14267664
 ] 

Hudson commented on YARN-2978:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1998 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1998/])
YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun 
Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Fix For: 2.7.0

 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch, YARN-2978.004.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info


[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267540#comment-14267540
 ] 

Hudson commented on YARN-2978:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #66 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/66/])
YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun 
Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt


 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Fix For: 2.7.0

 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch, YARN-2978.004.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services


[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267539#comment-14267539
 ] 

Hudson commented on YARN-2427:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #66 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/66/])
YARN-2427. Added the API of moving apps between queues in RM web services. 
Contributed by Varun Vasudev. (zjshen: rev 
60103fca04dc713183e4ec9e12f961642e7d1001)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, 
 apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info


[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267546#comment-14267546
 ] 

Hudson commented on YARN-2978:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #800 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/800/])
YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun 
Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java


 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Fix For: 2.7.0

 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch, YARN-2978.004.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-07 Thread Sunil G (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268756#comment-14268756
]

Sunil G commented on YARN-2933:
---

Hi [~mayank_bansal] and [~wangda]

This is a very needed implementation w.r.t node labels in preemption scenario.
However I have a concern, please discard if this is been considered already.

An application's(if not specified any labels during submission time)
containers, may fall in to nodes where it can be labelled or not labelled. Am
I correct? if so, with this implementation, preemption will always happen to
those containers which are running in a non-labelled node. This may not be
accurate. So is it possible to do preemption only for applications which are
submitted without any node labels?

-Sunil

Capacity Scheduler preemption policy should only consider capacity without
labels temporarily
-

Key: YARN-2933
URL: https://issues.apache.org/jira/browse/YARN-2933
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch,
YARN-2933-4.patch, YARN-2933-5.patch

Currently, we have capacity enforcement on each queue for each label in
CapacityScheduler, but we don't have preemption policy to support that.
YARN-2498 is targeting to support preemption respect node labels, but we have
some gaps in code base, like queues/FiCaScheduler should be able to get
usedResource/pendingResource, etc. by label. These items potentially need to
refactor CS which we need spend some time carefully think about.
For now, what immediately we can do is allow calculate ideal_allocation and
preempt containers only for resources on nodes without labels, to avoid
regression like: A cluster has some nodes with labels and some not, assume
queueA isn't satisfied for resource without label, but for now, preemption
policy may preempt resource from nodes with labels for queueA, that is not
correct.
Again, it is just a short-term enhancement, YARN-2498 will consider
preemption respecting node-labels for Capacity Scheduler which is our final
target.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268767#comment-14268767
 ] 

Gera Shegalov commented on YARN-2934:
-

bq. Given this, even the tailed stderr is not useful in such a situation. If 
the app-page ages out, where will the user see this additional diagnostic 
message that we tail out of logs?

It will be in the client output that I showed in the above comments. In our 
infrastructure, a failed job will generate an alert email containing the client 
log (or link to it).


 Improve handling of container's stderr 
 ---

 Key: YARN-2934
 URL: https://issues.apache.org/jira/browse/YARN-2934
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gera Shegalov
Assignee: Naganarasimha G R
Priority: Critical

 Most YARN applications redirect stderr to some file. That's why when 
 container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr

[
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268761#comment-14268761
]

Vinod Kumar Vavilapalli commented on YARN-2934:
---

bq. Yes it's related, but not exclusive to AM (try
-Dmapreduce.map.env=JAVA_HOME=/no/jvm/here). It's just more severe with AM.
Agreed. I was just saying that we can do what we did for AMs.

bq. The pointer to the tracking page can be of little value for a busy cluster.
The RMApp is likely to age out by the time the user gets to look at it, and
there is no JHS entry because the AM crashed.
Good point, I missed this one. Given this, even the tailed stderr is not useful
in such a situation. If the app-page ages out, where will the user see this
additional diagnostic message that we tail out of logs?

bq. It would be better to mention the nodeAddress as well, in addition to
containerId to be used with 'yarn logs'
This can be done in the additional message (like for AM) instead of cat/tail of
logs.

I guess the options are (1) Diagnostic message with links and reference to the
right logs saying something happened or (2) Diagnostic message itself
containing the tail of the log (which may or may not really determine the error
message). I think (1) is a must, (2) is a good to have.

Improve handling of container's stderr
---

Key: YARN-2934
URL: https://issues.apache.org/jira/browse/YARN-2934
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Gera Shegalov
Assignee: Naganarasimha G R
Priority: Critical

Most YARN applications redirect stderr to some file. That's why when
container launch fails with {{ExitCodeException}} the message is empty.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-01-07 Thread MUFEED USMAN (JIRA)

MUFEED USMAN created YARN-3017:
--

 Summary: ContainerID in ResourceManager Log Has Slightly Different 
Format From AppAttemptID
 Key: YARN-3017
 URL: https://issues.apache.org/jira/browse/YARN-3017
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: MUFEED USMAN
Priority: Minor


Not sure if this should be filed as a bug or not.

In the ResourceManager log in the events surrounding the creation of a new
application attempt,

...
...
2014-11-14 17:45:37,258 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
masterappattempt_1412150883650_0001_02
...
...

The application attempt has the ID format _1412150883650_0001_02.

Whereas the associated ContainerID goes by _1412150883650_0001_02_.

...
...
2014-11-14 17:45:37,260 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up
container Container: [ContainerId: container_1412150883650_0001_02_01,
NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, vCores:1,
disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
...
...

Curious to know if this is kept like that for a reason. If not while using
filtering tools to, say, grep events surrounding a specific attempt by the
numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-07 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.32.patch

Check tests using absoluteCapacity for userAmLimit

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268887#comment-14268887
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690722/YARN-2637.32.patch
  against trunk revision ef237bd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6278//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6278//console

This message is automatically generated.

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268791#comment-14268791
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690699/YARN-2637.31.patch
  against trunk revision ef237bd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6277//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6277//console

This message is automatically generated.

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

[
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268853#comment-14268853
]

Rohith commented on YARN-3017:
--

which version are you using? I donot see this behavior in trunk

ContainerID in ResourceManager Log Has Slightly Different Format From
AppAttemptID
--

Key: YARN-3017
URL: https://issues.apache.org/jira/browse/YARN-3017
Project: Hadoop YARN
Issue Type: Improvement
Reporter: MUFEED USMAN
Priority: Minor

Not sure if this should be filed as a bug or not.
In the ResourceManager log in the events surrounding the creation of a new
application attempt,
...
...
2014-11-14 17:45:37,258 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
masterappattempt_1412150883650_0001_02
...
...
The application attempt has the ID format _1412150883650_0001_02.
Whereas the associated ContainerID goes by _1412150883650_0001_02_.
...
...
2014-11-14 17:45:37,260 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting
up
container Container: [ContainerId: container_1412150883650_0001_02_01,
NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048,
vCores:1,
disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
...
...
Curious to know if this is kept like that for a reason. If not while using
filtering tools to, say, grep events surrounding a specific attempt by the
numeric ID part information may slip out during troubleshooting.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268449#comment-14268449
 ] 

Andrew Johnson commented on YARN-2893:
--

I'm seeing this error on a non-secure cluster. 

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour


[ 
https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268456#comment-14268456
 ] 

Wangda Tan commented on YARN-1743:
--

[~zjffdu],
Thanks for working on this ticket, the generated graph really helps a lot for 
people understanding how YARN works! Added the target version to see if we can 
get it in 2.7.0.

And for the POC patch, can we force the annotation type to be Class? Which will 
makes it can always be automatically updated if we make any changes on type 
names.

Wangda



 Decorate event transitions and the event-types with their behaviour
 ---

 Key: YARN-1743
 URL: https://issues.apache.org/jira/browse/YARN-1743
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jeff Zhang
 Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch


 Helps to annotate the transitions with (start-state, end-state) pair and the 
 events with (source, destination) pair.
 Not just readability, we may also use them to generate the event diagrams 
 across components.
 Not a blocker for 0.23, but let's see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour


[ 
https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268465#comment-14268465
 ] 

Hadoop QA commented on YARN-1743:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654900/NodeManager.pdf
  against trunk revision e13a484.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6273//console

This message is automatically generated.

 Decorate event transitions and the event-types with their behaviour
 ---

 Key: YARN-1743
 URL: https://issues.apache.org/jira/browse/YARN-1743
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jeff Zhang
 Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch


 Helps to annotate the transitions with (start-state, end-state) pair and the 
 events with (source, destination) pair.
 Not just readability, we may also use them to generate the event diagrams 
 across components.
 Not a blocker for 0.23, but let's see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour


 [ 
https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1743:
-
Labels: documentation  (was: )

 Decorate event transitions and the event-types with their behaviour
 ---

 Key: YARN-1743
 URL: https://issues.apache.org/jira/browse/YARN-1743
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jeff Zhang
  Labels: documentation
 Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch


 Helps to annotate the transitions with (start-state, end-state) pair and the 
 events with (source, destination) pair.
 Not just readability, we may also use them to generate the event diagrams 
 across components.
 Not a blocker for 0.23, but let's see.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-07 Thread Chris K Wensel (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268506#comment-14268506
 ] 

Chris K Wensel commented on YARN-3009:
--

Feels as if the api is becoming conflated if the filter value could be a nested 
JSON object instead of a literal value (string or number). Let alone brittle.

If this is a requirement of the api, I would expose a new parameter on the 
query that clearly states the value should be interpreted as an object. 

but I suspect this is better served instead of key=nested_object as 
path/to/attribute=literal_value (or a composition of them) query.



 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R
 Attachments: YARN-3009.20150108-1.patch


 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-07 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268546#comment-14268546
 ] 

Jian He commented on YARN-2997:
---

ah, I think the problem that container statuses whose application are stopped 
may be lost on NM resync exists before.  thanks for your clarification.  one 
minor comment: {{LinkedHashMapContainerId, ContainerStatus()}},  a regular 
HashMap should be enough instead of a linkedHashMap?

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection


 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20150107-1-full.patch

Updated against latest trunk

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now


[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268570#comment-14268570
 ] 

Hudson commented on YARN-2936:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6825 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6825/])
YARN-2936. Changed YARNDelegationTokenIdentifier to set proto fields on 
getProto method. Contributed by Varun Saxena (jianhe: rev 
2638f4d0f0da375b0dd08f3188057637ed0f32d5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* hadoop-yarn-project/CHANGES.txt


 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, 
 YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, 
 YARN-2936.006.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection


 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20150107-1-without-yarn.cmd.patch

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance

2015-01-07 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-2996:
-
Attachment: YARN-2996.004.patch

Good idea Zhijie, update the patch.

 Refine some fs operations in FileSystemRMStateStore to improve performance
 --

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, 
 YARN-2996.003.patch, YARN-2996.004.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-07 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268587#comment-14268587
 ] 

Yi Liu commented on YARN-3010:
--

Thanks [~jianhe] and [~rohithsharma]

 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes

2015-01-07 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated YARN-2996:
-
Summary: Refine fs operations in FileSystemRMStateStore and few fixes  
(was: Refine some fs operations in FileSystemRMStateStore to improve 
performance)

 Refine fs operations in FileSystemRMStateStore and few fixes
 

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, 
 YARN-2996.003.patch, YARN-2996.004.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled


[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268596#comment-14268596
 ] 

Hudson commented on YARN-2880:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6826 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6826/])
Moved YARN-2880 to improvement section in CHANGES.txt (jianhe: rev 
ef237bd52fc570292a7e608b373b51dd6d1590b8)
* hadoop-yarn-project/CHANGES.txt


 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268606#comment-14268606
 ] 

Vinod Kumar Vavilapalli commented on YARN-160:
--

Quick comments on the patch:
 - LinuxResourceCalculatorPlugin: numPhysicalSockets is not used anywhere?
 - WindowsResourceCalculatorPlugin: Why is num-cores set = num-processors ?
 - yarn-default.xml: Change it will set the X to Y to be it will set X to Y 
by default
 - yarn.nodemanager.count-logical-processors-as-cores: Not sure of the use for 
this. On Linux, shouldn't we simply use the the returned numCores if they are 
valid? And fall-back to numProcessors?
 - yarn.nodemanager.enable-hardware-capability-detection: I think specifying 
the capabilities to be -1 is already a way to trigger this automatic detection, 
let's simply drop the flag and assume it to be true all the time?
 - CGroupsLCEResourceHandler: The log message 'LOG.info(node vcores =  + 
nodeVCores);' is printed for every container launch.
 - Should we enforce somewhere that numCores = numProcessors if not that it is 
always a multiple?

{code}
   int containerPhysicalMemoryMB =
(int) (0.8f * (physicalMemoryMB - (2 * hadoopHeapSizeMB)));
{code}
We already have resource.percentage-physical-cpu-limit for CPUs - YARN-2440. 
How about simply adding a resource.percentage-pmem-limit instead making it a 
magic number in the code? Of course, we can have a default reserved percentage.

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, 
 apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268623#comment-14268623
 ] 

Vinod Kumar Vavilapalli commented on YARN-2893:
---

That is very interesting. In non-secure mode, strictly in YARN's purview, no 
tokens really flow from the client to the RM. May be we should look at 
Scalding/Cascading 's submission code to see if it injects some tokens in 
non-secure mode too?

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3015) yarn classpath command should support same options as hadoop classpath.

2015-01-07 Thread Chris Nauroth (JIRA)

Chris Nauroth created YARN-3015:
---

 Summary: yarn classpath command should support same options as 
hadoop classpath.
 Key: YARN-3015
 URL: https://issues.apache.org/jira/browse/YARN-3015
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Chris Nauroth
Priority: Minor


HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional 
expansion of the wildcards and bundling the classpath into a jar file 
containing a manifest with the Class-Path attribute. The other classpath 
commands should do the same for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.

2015-01-07 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268630#comment-14268630
 ] 

Chris Nauroth commented on YARN-3015:
-

Thanks to [~aw] for reporting it.

 yarn classpath command should support same options as hadoop classpath.
 ---

 Key: YARN-3015
 URL: https://issues.apache.org/jira/browse/YARN-3015
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Chris Nauroth
Priority: Minor

 HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional 
 expansion of the wildcards and bundling the classpath into a jar file 
 containing a manifest with the Class-Path attribute. The other classpath 
 commands should do the same for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily


[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268631#comment-14268631
 ] 

Wangda Tan commented on YARN-2933:
--

In addition to previously comment, I think we put incorrect #container for each 
application when setLabelContainer=true. The usedResource or current in 
TestProportionalPreemptionPolicy actually means used resource of nodes without 
label. So if we want to have labeled container in an application, we should 
make it stay outside of usedResource.

So in the patch, before:
{code}
 for (int i = 0; i  used; i += gran) {
   if(setAMContainer  i == 0){
 cLive.add(mockContainer(appAttId, cAlloc, unit, 0));
-  }else{
+  }else if(setLabelContainer  i ==1){
+cLive.add(mockContainer(appAttId, cAlloc, unit, 2));
+  }
{code}
We should add 
{code}
+if (setLabelContainer) {
+  used++;
+}
{code}
To make it correct. 

And {{testSkipLabeledContainer}} is fully covered by 
{{testIdealAllocationForLabels}}. Since we have already checked #container 
preempted in each application in {{testIdealAllocationForLabels}}, which 
implies labeled containers are ignored.

A minor suggest is rename {{setLabelContainer}} to {{setLabeledContainer}}

Thoughts?


 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268639#comment-14268639
 ] 

Vinod Kumar Vavilapalli commented on YARN-2934:
---

This seems like it is same as YARN-2013/YARN-2242.

For the AM, we ended up simply putting a log-message AM Container for 
\$ApplicationAttemptId exited with exitCode: \$ExitStatus.\nFor more detailed 
output, check application tracking page: \$TrackingUrl, Then, click on links to 
logs of each attempt.\n

You really don't want to cat stderr from containers. Containers may run for a 
very long time, spewing a lot of errors in stderr before finally failing. NM 
unconditionally reading logs in such cases will blow up NM heap. We either do a 
cross-platform way of tailing the last N bytes (not terribly useful if we cut 
lines mid way through) or better simply print a link to take them to the right 
set of logs.



 Improve handling of container's stderr 
 ---

 Key: YARN-2934
 URL: https://issues.apache.org/jira/browse/YARN-2934
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gera Shegalov
Assignee: Naganarasimha G R
Priority: Critical

 Most YARN applications redirect stderr to some file. That's why when 
 container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3015) yarn classpath command should support same options as hadoop classpath.

2015-01-07 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3015:
--

Assignee: Varun Saxena

 yarn classpath command should support same options as hadoop classpath.
 ---

 Key: YARN-3015
 URL: https://issues.apache.org/jira/browse/YARN-3015
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Chris Nauroth
Assignee: Varun Saxena
Priority: Minor

 HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional 
 expansion of the wildcards and bundling the classpath into a jar file 
 containing a manifest with the Class-Path attribute. The other classpath 
 commands should do the same for consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-07 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268644#comment-14268644
 ] 

Naganarasimha G R commented on YARN-3009:
-

Hi [~zjshen],
 Thanks for pointing out the interface, i think we could add a test case 
which  could take any other object like List/MAP also .
bq. would it be sufficient if we did not perform the comparison with the 
original String when the resulting Object is a List or Map? Or do you think a 
different approach would be better?
As the resulting object can be of any object and not just List or Map it would 
not be feasible in this way but we can think the other way if the resulting 
object is subclass of {{java.lang.Number}}, then we can have the check which i 
have given earlier, but not sure even this approach can break in any other 
case. 
bq.I would expose a new parameter on the query that clearly states the value 
should be interpreted as an object.
This also seems to be a suitable alternate for this issue, like we can take the 
type of object[/flag indicating not  wrapper objects ] as the third field 
separated by a comma character. 
bq. better served instead of key=nested_object as 
path/to/attribute=literal_value (or a composition of them)
Did not get this can you give an example ?
[~zjshen] which approach would be better ?


 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R
 Attachments: YARN-3009.20150108-1.patch


 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection


[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268647#comment-14268647
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12690672/YARN-2786-20150107-1-without-yarn.cmd.patch
  against trunk revision ef237bd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6275//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6275//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6275//console

This message is automatically generated.

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3016) (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager

Wangda Tan created YARN-3016:


 Summary: (Refactoring) Merge internalAdd/Remove/ReplaceLabels to 
one method in CommonNodeLabelsManager
 Key: YARN-3016
 URL: https://issues.apache.org/jira/browse/YARN-3016
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


Now we have separated but similar implementations for add/remove/replace labels 
on node in CommonNodeLabelsManager, we should merge it to a single one for 
easier modify them and better readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number


[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268110#comment-14268110
 ] 

Hadoop QA commented on YARN-3009:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12690592/YARN-3009.20150108-1.patch
  against trunk revision fe8d2bd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6272//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6272//console

This message is automatically generated.

 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R
 Attachments: YARN-3009.20150108-1.patch


 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext


[ 
https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268162#comment-14268162
 ] 

Rohith commented on YARN-3013:
--

AYS#containerLaunchedOnNode and AYS#killOrphanContainerOnNode does not hold 
synchronized lock. Adding synchronization to both of the method should be fine. 
There will not be any interlocking fixing this.

 Findbugs warning aboutAbstractYarnScheduler.rmContext
 -

 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen

 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
 Field 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
 Synchronized 91% of the time
 {code}
 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext


 [ 
https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-3013:


Assignee: Rohith

 Findbugs warning aboutAbstractYarnScheduler.rmContext
 -

 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Rohith

 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
 Field 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
 Synchronized 91% of the time
 {code}
 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext


 [ 
https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3013:
-
Attachment: 0001-YARN-3013.patch

 Findbugs warning aboutAbstractYarnScheduler.rmContext
 -

 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Rohith
 Attachments: 0001-YARN-3013.patch


 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
 Field 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
 Synchronized 91% of the time
 {code}
 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext


[ 
https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268179#comment-14268179
 ] 

Rohith commented on YARN-3013:
--

[~zjshen] kindly review analysis and patch

 Findbugs warning aboutAbstractYarnScheduler.rmContext
 -

 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Rohith
 Attachments: 0001-YARN-3013.patch


 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
 Field 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
 Synchronized 91% of the time
 {code}
 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext

2015-01-07 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3013.
---
Resolution: Duplicate

Close it as the duplicate. Thanks for pointing it out.

 Findbugs warning aboutAbstractYarnScheduler.rmContext
 -

 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Rohith
 Attachments: 0001-YARN-3013.patch


 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
 Field 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
 Synchronized 91% of the time
 {code}
 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext

2015-01-07 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-3013:
-

 Summary: Findbugs warning aboutAbstractYarnScheduler.rmContext
 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


{code}
Bug type IS2_INCONSISTENT_SYNC (click for details) 
In class 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
Field 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
Synchronized 91% of the time
{code}

See 
https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
 for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-07 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268160#comment-14268160
 ] 

Billie Rinaldi commented on YARN-3009:
--

bq. the nested Json structure will be mistaken as a string
Okay, would it be sufficient if we did not perform the comparison with the 
original String when the resulting Object is a List or Map?  Or do you think a 
different approach would be better?

 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R
 Attachments: YARN-3009.20150108-1.patch


 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext


[ 
https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268171#comment-14268171
 ] 

Rohith commented on YARN-3013:
--

There will not be any possibility of deadlock by adding synchronized key word 
to methods.It should be fine


 Findbugs warning aboutAbstractYarnScheduler.rmContext
 -

 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Rohith

 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
 Field 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
 Synchronized 91% of the time
 {code}
 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268193#comment-14268193
 ] 

Andrew Johnson commented on YARN-2893:
--

I've also noticed that if multiple jobs are submitted at the same time and this 
error occurs, all the jobs will fail.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268188#comment-14268188
 ] 

Rohith commented on YARN-3010:
--

+1(non-binding) LGTM.

 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.


 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2637:
-
Summary: maximum-am-resource-percent could be respected for both 
LeafQueue/User when trying to activate applications.  (was: 
maximum-am-resource-percent could be violated when resource of AM is  
minimumAllocation)

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268183#comment-14268183
 ] 

Andrew Johnson commented on YARN-2893:
--

I am also encountering this same error.  The failures are pretty sporadic and 
I've never been able to reproduce it.  Resubmitting the failed job always 
works, however.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext


[ 
https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268184#comment-14268184
 ] 

Rohith commented on YARN-3013:
--

Oho,I was not seen earlier!! This issue is duplicate of YARN-3010.

 Findbugs warning aboutAbstractYarnScheduler.rmContext
 -

 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Rohith
 Attachments: 0001-YARN-3013.patch


 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
 Field 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
 Synchronized 91% of the time
 {code}
 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268359#comment-14268359
 ] 

Andrew Johnson commented on YARN-2893:
--

No, it's at least 95% Scalding jobs.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3014) Changing labels on a host should update all NM's labels on that host


 [ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3014:
-
 Description: 
Admin can either specify labels on a host (by running {{yarn rmadmin 
-replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
rmadmin -replaceLabelsOnNode host1:port,label1}}).

If user has specified label=x on a NM (instead of host), and later set the 
label=y on host of the NM. NM's label should update to y as well.
Target Version/s: 2.7.0

 Changing labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan

 Admin can either specify labels on a host (by running {{yarn rmadmin 
 -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
 rmadmin -replaceLabelsOnNode host1:port,label1}}).
 If user has specified label=x on a NM (instead of host), and later set the 
 label=y on host of the NM. NM's label should update to y as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-01-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268403#comment-14268403
 ] 

Jason Lowe commented on YARN-2902:
--

Thanks for the patch, Varun!

I think the patch will prevent us from leaking the bookkeeping in resource 
trackers for resources in the downloading state, but it relies on the periodic 
retention checking and doesn't address the leaking of data on the disk.  The 
localizer has probably created a partial *_tmp file/dir for the download that 
didn't complete, and we should be cleaning that up as well.  As is we won't try 
to clean up any leaked DOWNLOADING resource until the retention process runs 
(on the order of tens of minutes), but we shouldn't need to wait around to reap 
resources that aren't really downloading.

I haven't had time to work this all the way through, but I'm wondering if we're 
patching the symptoms rather than the root cause.  The resource is lingering 
around in the DOWNLOADING state because a container was killed and we then 
forgot the corresponding localizer that was associated with the container. 
When the localizer later hearbeats in the NM tells the unknown localizer to DIE 
and that ultimately is what leads to a resource lingering around in the 
DOWNLOADING state.  I think we should be properly cleaning up localizers 
corresponding to killed containers and sending appropriate events to the 
LocalizedResources.  This will then cause the resources to transition out of 
the DOWNLOADING state to something appropriate, sending the proper events to 
any other containers that are pending on that resource.  At that point we can 
also clean up any leaked _tmp files/dirs from the failed/killed localizer.

 Killing a container that is localizing can orphan resources in the 
 DOWNLOADING state
 

 Key: YARN-2902
 URL: https://issues.apache.org/jira/browse/YARN-2902
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2902.002.patch, YARN-2902.patch


 If a container is in the process of localizing when it is stopped/killed then 
 resources are left in the DOWNLOADING state.  If no other container comes 
 along and requests these resources they linger around with no reference 
 counts but aren't cleaned up during normal cache cleanup scans since it will 
 never delete resources in the DOWNLOADING state even if their reference count 
 is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268340#comment-14268340
 ] 

Andrew Johnson commented on YARN-2893:
--

[~jira.shegalov] This is always with Scalding jobs.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268356#comment-14268356
 ] 

Gera Shegalov commented on YARN-2893:
-

Is there a significant fraction of other type of jobs on your clusters ?

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler


[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268391#comment-14268391
 ] 

Hudson commented on YARN-3010:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6823 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6823/])
YARN-3010. Fixed findbugs warning in AbstractYarnScheduler. Contributed by Yi 
Liu (jianhe: rev e13a484a2be64fb781c5eca5ae7056cbe194ac5e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java


 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268334#comment-14268334
 ] 

Gera Shegalov commented on YARN-2893:
-

Hi [~ajsquared], what type of jobs are you seeing this with? I think almost all 
failures for us are Scalding/Cascading jobs, which made me think that it has to 
do with their multithreaded job submission code.

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping


[ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268382#comment-14268382
 ] 

Wangda Tan commented on YARN-3003:
--

Relationship between NMs and labels is more than a Bidirectional Map. We have 
hierarchy for NMs: now YARN supports launching multiple node managers on a same 
host, so we have host-listNM. And for node labels administration propose, 
admin can set labels on a host (affects all NMs on that host) OR set labels on 
a single NM (affects the NM only).

I suggest to store nodes on label in NodeLabel class. For now we can store all 
related nodes, and in the future, we can extend it to support fetch running NMs 
associated to a given label.

Thanks,

 Provide API for client to retrieve label to node mapping
 

 Key: YARN-3003
 URL: https://issues.apache.org/jira/browse/YARN-3003
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Ted Yu
Assignee: Varun Saxena

 Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
 of labels associated with the node.
 Client (such as Slider) may be interested in label to node mapping - given 
 label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3014) Changing labels on a host should update all NM's labels on that host

Wangda Tan created YARN-3014:


 Summary: Changing labels on a host should update all NM's labels 
on that host
 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3014) Changing labels on a host should update all NM's labels on that host


 [ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-3014:


Assignee: Wangda Tan

 Changing labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3014) Changing labels on a host should update all NM's labels on that host


 [ 
https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3014:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2492

 Changing labels on a host should update all NM's labels on that host
 

 Key: YARN-3014
 URL: https://issues.apache.org/jira/browse/YARN-3014
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan

 Admin can either specify labels on a host (by running {{yarn rmadmin 
 -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn 
 rmadmin -replaceLabelsOnNode host1:port,label1}}).
 If user has specified label=x on a NM (instead of host), and later set the 
 label=y on host of the NM. NM's label should update to y as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream


[ 
https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268424#comment-14268424
 ] 

Vinod Kumar Vavilapalli commented on YARN-2893:
---

Is this in a secure cluster or a non-secure one? Trying to see if we can corner 
the type of tokens involved.

Also, is it possible to patch your clusters locally to have some debug logs in 
the ResourceManager?

 AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
 --

 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov

 MapReduce jobs on our clusters experience sporadic failures due to corrupt 
 tokens in the AM launch context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-07 Thread Chengbing Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268670#comment-14268670
 ] 

Chengbing Liu commented on YARN-2997:
-

Yes, a HashMap should be enough. I will upload a new one. Thanks.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2015-01-07 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268672#comment-14268672
 ] 

Allen Wittenauer commented on YARN-2786:


'cluster' does not come alphabetically after 'node'.

 Create yarn cluster CLI to enable list node labels collection
 -

 Key: YARN-2786
 URL: https://issues.apache.org/jira/browse/YARN-2786
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
 YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
 YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, 
 YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, 
 YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch


 With YARN-2778, we can list node labels on existing RM nodes. But it is not 
 enough, we should be able to: 
 1) list node labels collection
 The command should start with yarn cluster ..., in the future, we can add 
 more functionality to the yarnClusterCLI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-07 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Attachment: YARN-2997.5.patch

Update: use HashMap instead of LinkedHashMap.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.5.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes


[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268687#comment-14268687
 ] 

Hadoop QA commented on YARN-2996:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690673/YARN-2996.004.patch
  against trunk revision ef237bd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6274//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6274//console

This message is automatically generated.

 Refine fs operations in FileSystemRMStateStore and few fixes
 

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, 
 YARN-2996.003.patch, YARN-2996.004.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2934) Improve handling of container's stderr


[ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268712#comment-14268712
 ] 

Gera Shegalov commented on YARN-2934:
-

Yes it's related, but not exclusive to AM (try 
-Dmapreduce.map.env=JAVA_HOME=/no/jvm/here). It's just more severe with AM. 
cat is not the point. Getting the real diagnostics with something is, +1 for 
using tail. The pointer to the tracking page can be of little value for a busy 
cluster. The RMApp is likely to age out by the time the user gets to look at 
it, and there is no JHS entry because the AM crashed. It would be better to 
mention the nodeAddress  as well, in addition to containerId to be used with 
'yarn logs' 

 Improve handling of container's stderr 
 ---

 Key: YARN-2934
 URL: https://issues.apache.org/jira/browse/YARN-2934
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gera Shegalov
Assignee: Naganarasimha G R
Priority: Critical

 Most YARN applications redirect stderr to some file. That's why when 
 container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished


[ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268714#comment-14268714
 ] 

Hadoop QA commented on YARN-2997:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690687/YARN-2997.5.patch
  against trunk revision ef237bd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6276//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6276//console

This message is automatically generated.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.5.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer


 [ 
https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-644:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-662

 Basic null check is not performed on passed in arguments before using them in 
 ContainerManagerImpl.startContainer
 -

 Key: YARN-644
 URL: https://issues.apache.org/jira/browse/YARN-644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Priority: Minor

 I see that validation/ null check is not performed on passed in parameters. 
 Ex. tokenId.getContainerID().getApplicationAttemptId() inside 
 ContainerManagerImpl.authorizeRequest()
 I guess we should add these checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization


[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268735#comment-14268735
 ] 

Vinod Kumar Vavilapalli commented on YARN-3011:
---

This is a part of YARN-662 - the one about doing sanity-checks. Linking..

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena

 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.

2015-01-07 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.31.patch

See what happens when maxActiveApplications and maxActiveApplicationsPerUser 
are removed altogether

 maximum-am-resource-percent could be respected for both LeafQueue/User when 
 trying to activate applications.
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, 
 YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, 
 YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, 
 YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, 
 YARN-2637.31.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-662) [Umbrella] Enforce required parameters for all the protocols


 [ 
https://issues.apache.org/jira/browse/YARN-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-662:
-
Summary: [Umbrella] Enforce required parameters for all the protocols  
(was: Enforce required parameters for all the protocols)

 [Umbrella] Enforce required parameters for all the protocols
 

 Key: YARN-662
 URL: https://issues.apache.org/jira/browse/YARN-662
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Zhijie Shen

 All proto fields are marked as options. We need to mark some of them as 
 requried, or enforce these server side. Server side is likely better since 
 that's more flexible (Example deprecating a field type in favour of another - 
 either of the two must be present)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3016) (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager

2015-01-07 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268736#comment-14268736
 ] 

Sunil G commented on YARN-3016:
---

HI [~wangda]

Thanks for bringing up this. 
I have a doubt on this. Do you mean similar methods in CommonNodeLabelsManager 
and RMNodeLabelsManager ?

 (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in 
 CommonNodeLabelsManager
 -

 Key: YARN-3016
 URL: https://issues.apache.org/jira/browse/YARN-3016
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan

 Now we have separated but similar implementations for add/remove/replace 
 labels on node in CommonNodeLabelsManager, we should merge it to a single one 
 for easier modify them and better readability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization


 [ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3011:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-662

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena

 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer

2015-01-07 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-644:
-

Assignee: Varun Saxena

 Basic null check is not performed on passed in arguments before using them in 
 ContainerManagerImpl.startContainer
 -

 Key: YARN-644
 URL: https://issues.apache.org/jira/browse/YARN-644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Varun Saxena
Priority: Minor

 I see that validation/ null check is not performed on passed in parameters. 
 Ex. tokenId.getContainerID().getApplicationAttemptId() inside 
 ContainerManagerImpl.authorizeRequest()
 I guess we should add these checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2571) RM to support YARN registry

[
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268752#comment-14268752
]

Vinod Kumar Vavilapalli commented on YARN-2571:
---

Apologies for coming in really late on this.

bq. startup: create the /services and /users paths with system ACLs (yarn, hdfs
principals)
bq. app-launch: create the user directory /users/$username with the relevant
permissions (CRD) for them to create subnodes.
None of this is RM responsibility. Similar to creation of user directories on
HDFS, this needs to be taken care of by administrators/external systems.

bq. bq. attempt, container, app completion: remove service records with the
matching persistence and ID
This looks like application-level responsibility. Removing records on
container-completion can and should be done by the individual apps'
ApplicationMasters. Removing records on app completion should be done in an
application-cleanup container (YARN-2261). Any use-case for application-attempt
level records?

RM to support YARN registry

Key: YARN-2571
URL: https://issues.apache.org/jira/browse/YARN-2571
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Steve Loughran
Assignee: Steve Loughran
Attachments: YARN-2571-001.patch, YARN-2571-002.patch,
YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch,
YARN-2571-008.patch, YARN-2571-009.patch

The RM needs to (optionally) integrate with the YARN registry:
# startup: create the /services and /users paths with system ACLs (yarn, hdfs
principals)
# app-launch: create the user directory /users/$username with the relevant
permissions (CRD) for them to create subnodes.
# attempt, container, app completion: remove service records with the
matching persistence and ID

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info


[ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267748#comment-14267748
 ] 

Hudson commented on YARN-2978:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2017 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2017/])
YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun 
Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


 ResourceManager crashes with NPE while getting queue info
 -

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena
Priority: Critical
  Labels: capacityscheduler, resourcemanager
 Fix For: 2.7.0

 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, 
 YARN-2978.003.patch, YARN-2978.004.patch


  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services


[ 
https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267747#comment-14267747
 ] 

Hudson commented on YARN-2427:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2017 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2017/])
YARN-2427. Added the API of moving apps between queues in RM web services. 
Contributed by Varun Vasudev. (zjshen: rev 
60103fca04dc713183e4ec9e12f961642e7d1001)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


 Add support for moving apps between queues in RM web services
 -

 Key: YARN-2427
 URL: https://issues.apache.org/jira/browse/YARN-2427
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, 
 apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch


 Support for moving apps from one queue to another is now present in 
 CapacityScheduler and FairScheduler. We should expose the functionality via 
 RM web services as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive

2015-01-07 Thread Masatake Iwasaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-2807:
---
Attachment: YARN-2807.3.patch

Removed trailing whitespaces.

 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2934) Improve handling of container's stderr


 [ 
https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated YARN-2934:

Priority: Critical  (was: Major)

 Improve handling of container's stderr 
 ---

 Key: YARN-2934
 URL: https://issues.apache.org/jira/browse/YARN-2934
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gera Shegalov
Assignee: Naganarasimha G R
Priority: Critical

 Most YARN applications redirect stderr to some file. That's why when 
 container launch fails with {{ExitCodeException}} the message is empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive


[ 
https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267858#comment-14267858
 ] 

Hadoop QA commented on YARN-2807:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690558/YARN-2807.3.patch
  against trunk revision 788ee35.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl
  org.apache.hadoop.ipc.TestCallQueueManager

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6268//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6268//console

This message is automatically generated.

 Option --forceactive not works as described in usage of yarn rmadmin 
 -transitionToActive
 

 Key: YARN-2807
 URL: https://issues.apache.org/jira/browse/YARN-2807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, resourcemanager
Reporter: Wangda Tan
Assignee: Masatake Iwasaki
Priority: Minor
 Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch


 Currently the help message of yarn rmadmin -transitionToActive is:
 {code}
 transitionToActive: incorrect number of arguments
 Usage: HAAdmin [-transitionToActive serviceId [--forceactive]]
 {code}
 But the --forceactive not works as expected. When transition RM state with 
 --forceactive:
 {code}
 yarn rmadmin -transitionToActive rm2 --forceactive
 Automatic failover is enabled for 
 org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e
 Refusing to manually manage HA state, since it may cause
 a split-brain scenario or other incorrect state.
 If you are very sure you know what you are doing, please
 specify the forcemanual flag.
 {code}
 As shown above, we still cannot transitionToActive when automatic failover is 
 enabled with --forceactive.
 The option can work is: {{--forcemanual}}, there's no place in usage 
 describes this option. I think we should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3012) Hadoop 2.6.0: Basic error “starting MRAppMaster” after installing

2015-01-07 Thread Dinh Hoang Mai (JIRA)

Dinh Hoang Mai created YARN-3012:


 Summary: Hadoop 2.6.0: Basic error “starting MRAppMaster” after 
installing
 Key: YARN-3012
 URL: https://issues.apache.org/jira/browse/YARN-3012
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.6.0
 Environment: CentOS 64bit
Reporter: Dinh Hoang Mai
Priority: Critical
 Fix For: 2.6.0


I have just started to work with Hadoop 2.

After installing with basic configs 
(http://pl.postech.ac.kr/wiki/doku.php?id=maidinh:hadoop2_install), I always 
failed to run any examples. Has anyone seen this problem and please help me?

This is the log

2015-01-08 01:52:01,599 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1420648881673_0004_01
2015-01-08 01:52:01,764 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.security.Groups.init(Groups.java:70)
at org.apache.hadoop.security.Groups.init(Groups.java:66)
at 
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
... 7 more
Caused by: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method)
at 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49)
at 
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:39)
... 12 more
2015-01-08 01:52:01,767 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
with status 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3012) Hadoop 2.6.0: Basic error “starting MRAppMaster” after installing

2015-01-07 Thread Dinh Hoang Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinh Hoang Mai updated YARN-3012:
-
Description: 
I have just started to work with Hadoop 2.

After installing with basic configs, I always failed to run any examples. Has 
anyone seen this problem and please help me?

This is the log

2015-01-08 01:52:01,599 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1420648881673_0004_01
2015-01-08 01:52:01,764 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.security.Groups.init(Groups.java:70)
at org.apache.hadoop.security.Groups.init(Groups.java:66)
at 
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
... 7 more
Caused by: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method)
at 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49)
at 
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:39)
... 12 more
2015-01-08 01:52:01,767 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
with status 1

  was:
I have just started to work with Hadoop 2.

After installing with basic configs 
(http://pl.postech.ac.kr/wiki/doku.php?id=maidinh:hadoop2_install), I always 
failed to run any examples. Has anyone seen this problem and please help me?

This is the log

2015-01-08 01:52:01,599 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1420648881673_0004_01
2015-01-08 01:52:01,764 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.security.Groups.init(Groups.java:70)
at org.apache.hadoop.security.Groups.init(Groups.java:66)
at 
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
... 7 more
Caused by: java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method)
at 
org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49)
at 
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:39)
... 12 more
2015-01-08 01:52:01,767 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
with status 1


 Hadoop 2.6.0: Basic error

[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-07 Thread Chris K Wensel (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267883#comment-14267883
 ] 

Chris K Wensel commented on YARN-3009:
--

First, it's a little odd to put a value in quotes that is part of a query 
string. but that's a reasonable workaround though non-obvious.

Second, this then becomes a bug in Apache Tez DAGClientTimelineImpl since it 
does not quote values as it builds the query string.

fwiw, using quotes to prevent interpreting 7 as a number instead of a 
string makes a lot of sense. but quoting 7ABDCEFG to make sure it isn't 
interpreted as a 7 is again non-intuitive.



 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R

 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2015-01-07 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-160:
---
Attachment: apache-yarn-160.3.patch

Uploaded a new patch - apache-yarn-160.3.patch.
1. rebase to trunk
2. add a flag that allows users to turn off detection of underlying hardware.

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, 
 apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2015-01-07 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267938#comment-14267938
 ] 

Allen Wittenauer commented on YARN-160:
---

bq. RAM-2*HADOOP_HEAPSIZE

HADOOP_HEAPSIZE_MAX in trunk.  HADOOP_HEAPSIZE was deprecated. 

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, 
 apache-yarn-160.2.patch, apache-yarn-160.3.patch


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-07 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267962#comment-14267962
]

Mayank Bansal commented on YARN-2933:
-

Thanks [~wangda] for review.

1. Fixed, I should have used it.
2. I think getter and setter should be there.
3. Done
4. Done
5. Test is fixed
6. FInd bug is not due to this patch.

Thanks,
Mayank

Capacity Scheduler preemption policy should only consider capacity without
labels temporarily
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-07 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-5.patch

Updating patch.

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance

2015-01-07 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267991#comment-14267991
 ] 

Zhijie Shen commented on YARN-2996:
---

Almost good to me. Just one nit: can we reuse getFileStatus() method here too?
{code}
FileStatus status;
try {
  status = fs.getFileStatus(amrmTokenSecretManagerStateDataDir);
  assert status.isFile();
} catch (FileNotFoundException ex) {
  return;
}
{code}

 Refine some fs operations in FileSystemRMStateStore to improve performance
 --

 Key: YARN-2996
 URL: https://issues.apache.org/jira/browse/YARN-2996
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: YARN-2996.001.patch, YARN-2996.002.patch, 
 YARN-2996.003.patch


 In {{FileSystemRMStateStore}}, we can refine some fs operations to improve 
 performance:
 *1.* There are several places invoke {{fs.exists}}, then 
 {{fs.getFileStatus}}, we can merge them to save one RPC call
 {code}
 if (fs.exists(versionNodePath)) {
 FileStatus status = fs.getFileStatus(versionNodePath);
 {code}
 *2.*
 {code}
 protected void updateFile(Path outputPath, byte[] data) throws Exception {
   Path newPath = new Path(outputPath.getParent(), outputPath.getName() + 
 .new);
   // use writeFile to make sure .new file is created atomically
   writeFile(newPath, data);
   replaceFile(newPath, outputPath);
 }
 {code}
 The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then 
 rename to _output\_file_.new, then rename it to _output\_file_, we can reduce 
 one rename operation.
 Also there is one unnecessary import, we can remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)


[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268020#comment-14268020
 ] 

Hudson commented on YARN-2230:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6821 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6821/])
YARN-2230. Fixed few configs description in yarn-default.xml. Contributed by 
Vijay Bhat (jianhe: rev fe8d2bd74175e7ad521bc310c41a367c0946d6ec)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Assignee: Vijay Bhat
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2230.001.patch, YARN-2230.002.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler

2015-01-07 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268032#comment-14268032
 ] 

Jian He commented on YARN-3010:
---

lgtm

 Fix recent findbug issue in AbstractYarnScheduler
 -

 Key: YARN-3010
 URL: https://issues.apache.org/jira/browse/YARN-3010
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yi Liu
Assignee: Yi Liu
Priority: Minor
 Attachments: YARN-3010.001.patch, YARN-3010.002.patch


 A new findbug issues reported recently in latest trunk: 
 {quote}
 ISInconsistent synchronization of 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext;
  locked 91% of time
 {quote}
 https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760
 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number

2015-01-07 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3009:

Attachment: YARN-3009.20150108-1.patch

Attaching the patch

 TimelineWebServices always parses primary and secondary filters as numbers if 
 first char is a number
 

 Key: YARN-3009
 URL: https://issues.apache.org/jira/browse/YARN-3009
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Chris K Wensel
Assignee: Naganarasimha G R
 Attachments: YARN-3009.20150108-1.patch


 If you pass a filter value that starts with a number (7CCA...), the filter 
 value will be parsed into the Number '7' causing the filter to fail the 
 search.
 Should be noted the actual value as stored via a PUT operation is properly 
 parsed and stored as a String.
 This manifests as a very hard to identify issue with DAGClient in Apache Tez 
 and naming dags/vertices with alphanumeric guid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh