[jira] [Commented] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267455#comment-14267455 ] Hadoop QA commented on YARN-3006: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690518/YARN-3006.001.patch against trunk revision 788ee35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6267//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6267//console This message is automatically generated. Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267388#comment-14267388 ] Hadoop QA commented on YARN-2996: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690501/YARN-2996.003.patch against trunk revision 788ee35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6266//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6266//console This message is automatically generated. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3006) Improve the error message when attempting manual failover with auto-failover enabled
[ https://issues.apache.org/jira/browse/YARN-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3006: Attachment: YARN-3006.001.patch Attaching a simple patch. Improve the error message when attempting manual failover with auto-failover enabled Key: YARN-3006 URL: https://issues.apache.org/jira/browse/YARN-3006 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Attachments: YARN-3006.001.patch When executing manual failover with automatic failover enabled, UnsupportedOperationException is thrown. {code} # yarn rmadmin -failover rm1 rm2 Exception in thread main java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address at org.apache.hadoop.yarn.client.RMHAServiceTarget.getZKFCAddress(RMHAServiceTarget.java:51) at org.apache.hadoop.ha.HAServiceTarget.getZKFCProxy(HAServiceTarget.java:94) at org.apache.hadoop.ha.HAAdmin.gracefulFailoverThroughZKFCs(HAAdmin.java:311) at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:282) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:449) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:378) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:482) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.main(RMAdminCLI.java:622) {code} I'm thinking the above message is confusing to users. (Users may think whether ZKFC is configured correctly...) The command should output error message to stderr instead of throwing Exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267545#comment-14267545 ] Hudson commented on YARN-2427: -- FAILURE: Integrated in Hadoop-Yarn-trunk #800 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/800/]) YARN-2427. Added the API of moving apps between queues in RM web services. Contributed by Varun Vasudev. (zjshen: rev 60103fca04dc713183e4ec9e12f961642e7d1001) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267663#comment-14267663 ] Hudson commented on YARN-2427: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1998 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1998/]) YARN-2427. Added the API of moving apps between queues in RM web services. Contributed by Varun Vasudev. (zjshen: rev 60103fca04dc713183e4ec9e12f961642e7d1001) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info
[ https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267664#comment-14267664 ] Hudson commented on YARN-2978: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1998 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1998/]) YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java ResourceManager crashes with NPE while getting queue info - Key: YARN-2978 URL: https://issues.apache.org/jira/browse/YARN-2978 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena Priority: Critical Labels: capacityscheduler, resourcemanager Fix For: 2.7.0 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, YARN-2978.003.patch, YARN-2978.004.patch java.lang.NullPointerException at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info
[ https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267540#comment-14267540 ] Hudson commented on YARN-2978: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #66 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/66/]) YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt ResourceManager crashes with NPE while getting queue info - Key: YARN-2978 URL: https://issues.apache.org/jira/browse/YARN-2978 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena Priority: Critical Labels: capacityscheduler, resourcemanager Fix For: 2.7.0 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, YARN-2978.003.patch, YARN-2978.004.patch java.lang.NullPointerException at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267539#comment-14267539 ] Hudson commented on YARN-2427: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #66 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/66/]) YARN-2427. Added the API of moving apps between queues in RM web services. Contributed by Varun Vasudev. (zjshen: rev 60103fca04dc713183e4ec9e12f961642e7d1001) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info
[ https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267546#comment-14267546 ] Hudson commented on YARN-2978: -- FAILURE: Integrated in Hadoop-Yarn-trunk #800 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/800/]) YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java ResourceManager crashes with NPE while getting queue info - Key: YARN-2978 URL: https://issues.apache.org/jira/browse/YARN-2978 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena Priority: Critical Labels: capacityscheduler, resourcemanager Fix For: 2.7.0 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, YARN-2978.003.patch, YARN-2978.004.patch java.lang.NullPointerException at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268756#comment-14268756 ] Sunil G commented on YARN-2933: --- Hi [~mayank_bansal] and [~wangda] This is a very needed implementation w.r.t node labels in preemption scenario. However I have a concern, please discard if this is been considered already. An application's(if not specified any labels during submission time) containers, may fall in to nodes where it can be labelled or not labelled. Am I correct? if so, with this implementation, preemption will always happen to those containers which are running in a non-labelled node. This may not be accurate. So is it possible to do preemption only for applications which are submitted without any node labels? -Sunil Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268767#comment-14268767 ] Gera Shegalov commented on YARN-2934: - bq. Given this, even the tailed stderr is not useful in such a situation. If the app-page ages out, where will the user see this additional diagnostic message that we tail out of logs? It will be in the client output that I showed in the above comments. In our infrastructure, a failed job will generate an alert email containing the client log (or link to it). Improve handling of container's stderr --- Key: YARN-2934 URL: https://issues.apache.org/jira/browse/YARN-2934 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Naganarasimha G R Priority: Critical Most YARN applications redirect stderr to some file. That's why when container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268761#comment-14268761 ] Vinod Kumar Vavilapalli commented on YARN-2934: --- bq. Yes it's related, but not exclusive to AM (try -Dmapreduce.map.env=JAVA_HOME=/no/jvm/here). It's just more severe with AM. Agreed. I was just saying that we can do what we did for AMs. bq. The pointer to the tracking page can be of little value for a busy cluster. The RMApp is likely to age out by the time the user gets to look at it, and there is no JHS entry because the AM crashed. Good point, I missed this one. Given this, even the tailed stderr is not useful in such a situation. If the app-page ages out, where will the user see this additional diagnostic message that we tail out of logs? bq. It would be better to mention the nodeAddress as well, in addition to containerId to be used with 'yarn logs' This can be done in the additional message (like for AM) instead of cat/tail of logs. I guess the options are (1) Diagnostic message with links and reference to the right logs saying something happened or (2) Diagnostic message itself containing the tail of the log (which may or may not really determine the error message). I think (1) is a must, (2) is a good to have. Improve handling of container's stderr --- Key: YARN-2934 URL: https://issues.apache.org/jira/browse/YARN-2934 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Naganarasimha G R Priority: Critical Most YARN applications redirect stderr to some file. That's why when container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID
MUFEED USMAN created YARN-3017: -- Summary: ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID Key: YARN-3017 URL: https://issues.apache.org/jira/browse/YARN-3017 Project: Hadoop YARN Issue Type: Improvement Reporter: MUFEED USMAN Priority: Minor Not sure if this should be filed as a bug or not. In the ResourceManager log in the events surrounding the creation of a new application attempt, ... ... 2014-11-14 17:45:37,258 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1412150883650_0001_02 ... ... The application attempt has the ID format _1412150883650_0001_02. Whereas the associated ContainerID goes by _1412150883650_0001_02_. ... ... 2014-11-14 17:45:37,260 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1412150883650_0001_02_01, NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, vCores:1, disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service: 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02 ... ... Curious to know if this is kept like that for a reason. If not while using filtering tools to, say, grep events surrounding a specific attempt by the numeric ID part information may slip out during troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.32.patch Check tests using absoluteCapacity for userAmLimit maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268887#comment-14268887 ] Hadoop QA commented on YARN-2637: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690722/YARN-2637.32.patch against trunk revision ef237bd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6278//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6278//console This message is automatically generated. maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.32.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268791#comment-14268791 ] Hadoop QA commented on YARN-2637: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690699/YARN-2637.31.patch against trunk revision ef237bd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6277//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6277//console This message is automatically generated. maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID
[ https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268853#comment-14268853 ] Rohith commented on YARN-3017: -- which version are you using? I donot see this behavior in trunk ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID -- Key: YARN-3017 URL: https://issues.apache.org/jira/browse/YARN-3017 Project: Hadoop YARN Issue Type: Improvement Reporter: MUFEED USMAN Priority: Minor Not sure if this should be filed as a bug or not. In the ResourceManager log in the events surrounding the creation of a new application attempt, ... ... 2014-11-14 17:45:37,258 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1412150883650_0001_02 ... ... The application attempt has the ID format _1412150883650_0001_02. Whereas the associated ContainerID goes by _1412150883650_0001_02_. ... ... 2014-11-14 17:45:37,260 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1412150883650_0001_02_01, NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048, vCores:1, disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service: 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02 ... ... Curious to know if this is kept like that for a reason. If not while using filtering tools to, say, grep events surrounding a specific attempt by the numeric ID part information may slip out during troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268449#comment-14268449 ] Andrew Johnson commented on YARN-2893: -- I'm seeing this error on a non-secure cluster. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268456#comment-14268456 ] Wangda Tan commented on YARN-1743: -- [~zjffdu], Thanks for working on this ticket, the generated graph really helps a lot for people understanding how YARN works! Added the target version to see if we can get it in 2.7.0. And for the POC patch, can we force the annotation type to be Class? Which will makes it can always be automatically updated if we make any changes on type names. Wangda Decorate event transitions and the event-types with their behaviour --- Key: YARN-1743 URL: https://issues.apache.org/jira/browse/YARN-1743 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Jeff Zhang Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch Helps to annotate the transitions with (start-state, end-state) pair and the events with (source, destination) pair. Not just readability, we may also use them to generate the event diagrams across components. Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268465#comment-14268465 ] Hadoop QA commented on YARN-1743: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654900/NodeManager.pdf against trunk revision e13a484. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6273//console This message is automatically generated. Decorate event transitions and the event-types with their behaviour --- Key: YARN-1743 URL: https://issues.apache.org/jira/browse/YARN-1743 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Jeff Zhang Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch Helps to annotate the transitions with (start-state, end-state) pair and the events with (source, destination) pair. Not just readability, we may also use them to generate the event diagrams across components. Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1743: - Labels: documentation (was: ) Decorate event transitions and the event-types with their behaviour --- Key: YARN-1743 URL: https://issues.apache.org/jira/browse/YARN-1743 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Jeff Zhang Labels: documentation Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743.patch Helps to annotate the transitions with (start-state, end-state) pair and the events with (source, destination) pair. Not just readability, we may also use them to generate the event diagrams across components. Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268506#comment-14268506 ] Chris K Wensel commented on YARN-3009: -- Feels as if the api is becoming conflated if the filter value could be a nested JSON object instead of a literal value (string or number). Let alone brittle. If this is a requirement of the api, I would expose a new parameter on the query that clearly states the value should be interpreted as an object. but I suspect this is better served instead of key=nested_object as path/to/attribute=literal_value (or a composition of them) query. TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R Attachments: YARN-3009.20150108-1.patch If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268546#comment-14268546 ] Jian He commented on YARN-2997: --- ah, I think the problem that container statuses whose application are stopped may be lost on NM resync exists before. thanks for your clarification. one minor comment: {{LinkedHashMapContainerId, ContainerStatus()}}, a regular HashMap should be enough instead of a linkedHashMap? NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2786: - Attachment: YARN-2786-20150107-1-full.patch Updated against latest trunk Create yarn cluster CLI to enable list node labels collection - Key: YARN-2786 URL: https://issues.apache.org/jira/browse/YARN-2786 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, YARN-2786-20150107-1-full.patch With YARN-2778, we can list node labels on existing RM nodes. But it is not enough, we should be able to: 1) list node labels collection The command should start with yarn cluster ..., in the future, we can add more functionality to the yarnClusterCLI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now
[ https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268570#comment-14268570 ] Hudson commented on YARN-2936: -- FAILURE: Integrated in Hadoop-trunk-Commit #6825 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6825/]) YARN-2936. Changed YARNDelegationTokenIdentifier to set proto fields on getProto method. Contributed by Varun Saxena (jianhe: rev 2638f4d0f0da375b0dd08f3188057637ed0f32d5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java * hadoop-yarn-project/CHANGES.txt YARNDelegationTokenIdentifier doesn't set proto.builder now --- Key: YARN-2936 URL: https://issues.apache.org/jira/browse/YARN-2936 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2936.001.patch, YARN-2936.002.patch, YARN-2936.003.patch, YARN-2936.004.patch, YARN-2936.005.patch, YARN-2936.006.patch After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, such that when constructing a object which extends YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, when we call getProto() of it, we will just get an empty proto object. It seems to do no harm to the production code path, as we will always call getBytes() before using proto to persist the DT in the state store, when we generating the password. I think the setter is removed to avoid duplicating setting the fields why getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work properly alone. YARNDelegationTokenIdentifier is tightly coupled with the logic in secretManager. It's vulnerable if something is changed at secretManager. For example, in the test case of YARN-2837, I spent time to figure out we need to execute getBytes() first to make sure the testing DTs can be properly put into the state store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2786: - Attachment: YARN-2786-20150107-1-without-yarn.cmd.patch Create yarn cluster CLI to enable list node labels collection - Key: YARN-2786 URL: https://issues.apache.org/jira/browse/YARN-2786 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch With YARN-2778, we can list node labels on existing RM nodes. But it is not enough, we should be able to: 1) list node labels collection The command should start with yarn cluster ..., in the future, we can add more functionality to the yarnClusterCLI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Attachment: YARN-2996.004.patch Good idea Zhijie, update the patch. Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch, YARN-2996.004.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268587#comment-14268587 ] Yi Liu commented on YARN-3010: -- Thanks [~jianhe] and [~rohithsharma] Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: 2.7.0 Attachments: YARN-3010.001.patch, YARN-3010.002.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated YARN-2996: - Summary: Refine fs operations in FileSystemRMStateStore and few fixes (was: Refine some fs operations in FileSystemRMStateStore to improve performance) Refine fs operations in FileSystemRMStateStore and few fixes Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch, YARN-2996.004.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled
[ https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268596#comment-14268596 ] Hudson commented on YARN-2880: -- FAILURE: Integrated in Hadoop-trunk-Commit #6826 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6826/]) Moved YARN-2880 to improvement section in CHANGES.txt (jianhe: rev ef237bd52fc570292a7e608b373b51dd6d1590b8) * hadoop-yarn-project/CHANGES.txt Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled --- Key: YARN-2880 URL: https://issues.apache.org/jira/browse/YARN-2880 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Rohith Fix For: 2.7.0 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, YARN-2880.1.patch, YARN-2880.2.patch As suggested by [~ozawa], [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569]. We should have a such test to make sure there will be no regression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268606#comment-14268606 ] Vinod Kumar Vavilapalli commented on YARN-160: -- Quick comments on the patch: - LinuxResourceCalculatorPlugin: numPhysicalSockets is not used anywhere? - WindowsResourceCalculatorPlugin: Why is num-cores set = num-processors ? - yarn-default.xml: Change it will set the X to Y to be it will set X to Y by default - yarn.nodemanager.count-logical-processors-as-cores: Not sure of the use for this. On Linux, shouldn't we simply use the the returned numCores if they are valid? And fall-back to numProcessors? - yarn.nodemanager.enable-hardware-capability-detection: I think specifying the capabilities to be -1 is already a way to trigger this automatic detection, let's simply drop the flag and assume it to be true all the time? - CGroupsLCEResourceHandler: The log message 'LOG.info(node vcores = + nodeVCores);' is printed for every container launch. - Should we enforce somewhere that numCores = numProcessors if not that it is always a multiple? {code} int containerPhysicalMemoryMB = (int) (0.8f * (physicalMemoryMB - (2 * hadoopHeapSizeMB))); {code} We already have resource.percentage-physical-cpu-limit for CPUs - YARN-2440. How about simply adding a resource.percentage-pmem-limit instead making it a magic number in the code? Of course, we can have a default reserved percentage. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268623#comment-14268623 ] Vinod Kumar Vavilapalli commented on YARN-2893: --- That is very interesting. In non-secure mode, strictly in YARN's purview, no tokens really flow from the client to the RM. May be we should look at Scalding/Cascading 's submission code to see if it injects some tokens in non-secure mode too? AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
Chris Nauroth created YARN-3015: --- Summary: yarn classpath command should support same options as hadoop classpath. Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Priority: Minor HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268630#comment-14268630 ] Chris Nauroth commented on YARN-3015: - Thanks to [~aw] for reporting it. yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Priority: Minor HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268631#comment-14268631 ] Wangda Tan commented on YARN-2933: -- In addition to previously comment, I think we put incorrect #container for each application when setLabelContainer=true. The usedResource or current in TestProportionalPreemptionPolicy actually means used resource of nodes without label. So if we want to have labeled container in an application, we should make it stay outside of usedResource. So in the patch, before: {code} for (int i = 0; i used; i += gran) { if(setAMContainer i == 0){ cLive.add(mockContainer(appAttId, cAlloc, unit, 0)); - }else{ + }else if(setLabelContainer i ==1){ +cLive.add(mockContainer(appAttId, cAlloc, unit, 2)); + } {code} We should add {code} +if (setLabelContainer) { + used++; +} {code} To make it correct. And {{testSkipLabeledContainer}} is fully covered by {{testIdealAllocationForLabels}}. Since we have already checked #container preempted in each application in {{testIdealAllocationForLabels}}, which implies labeled containers are ignored. A minor suggest is rename {{setLabelContainer}} to {{setLabeledContainer}} Thoughts? Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268639#comment-14268639 ] Vinod Kumar Vavilapalli commented on YARN-2934: --- This seems like it is same as YARN-2013/YARN-2242. For the AM, we ended up simply putting a log-message AM Container for \$ApplicationAttemptId exited with exitCode: \$ExitStatus.\nFor more detailed output, check application tracking page: \$TrackingUrl, Then, click on links to logs of each attempt.\n You really don't want to cat stderr from containers. Containers may run for a very long time, spewing a lot of errors in stderr before finally failing. NM unconditionally reading logs in such cases will blow up NM heap. We either do a cross-platform way of tailing the last N bytes (not terribly useful if we cut lines mid way through) or better simply print a link to take them to the right set of logs. Improve handling of container's stderr --- Key: YARN-2934 URL: https://issues.apache.org/jira/browse/YARN-2934 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Naganarasimha G R Priority: Critical Most YARN applications redirect stderr to some file. That's why when container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3015) yarn classpath command should support same options as hadoop classpath.
[ https://issues.apache.org/jira/browse/YARN-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3015: -- Assignee: Varun Saxena yarn classpath command should support same options as hadoop classpath. --- Key: YARN-3015 URL: https://issues.apache.org/jira/browse/YARN-3015 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Chris Nauroth Assignee: Varun Saxena Priority: Minor HADOOP-10903 enhanced the {{hadoop classpath}} command to support optional expansion of the wildcards and bundling the classpath into a jar file containing a manifest with the Class-Path attribute. The other classpath commands should do the same for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268644#comment-14268644 ] Naganarasimha G R commented on YARN-3009: - Hi [~zjshen], Thanks for pointing out the interface, i think we could add a test case which could take any other object like List/MAP also . bq. would it be sufficient if we did not perform the comparison with the original String when the resulting Object is a List or Map? Or do you think a different approach would be better? As the resulting object can be of any object and not just List or Map it would not be feasible in this way but we can think the other way if the resulting object is subclass of {{java.lang.Number}}, then we can have the check which i have given earlier, but not sure even this approach can break in any other case. bq.I would expose a new parameter on the query that clearly states the value should be interpreted as an object. This also seems to be a suitable alternate for this issue, like we can take the type of object[/flag indicating not wrapper objects ] as the third field separated by a comma character. bq. better served instead of key=nested_object as path/to/attribute=literal_value (or a composition of them) Did not get this can you give an example ? [~zjshen] which approach would be better ? TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R Attachments: YARN-3009.20150108-1.patch If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268647#comment-14268647 ] Hadoop QA commented on YARN-2786: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690672/YARN-2786-20150107-1-without-yarn.cmd.patch against trunk revision ef237bd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6275//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6275//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6275//console This message is automatically generated. Create yarn cluster CLI to enable list node labels collection - Key: YARN-2786 URL: https://issues.apache.org/jira/browse/YARN-2786 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch With YARN-2778, we can list node labels on existing RM nodes. But it is not enough, we should be able to: 1) list node labels collection The command should start with yarn cluster ..., in the future, we can add more functionality to the yarnClusterCLI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3016) (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager
Wangda Tan created YARN-3016: Summary: (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager Key: YARN-3016 URL: https://issues.apache.org/jira/browse/YARN-3016 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Now we have separated but similar implementations for add/remove/replace labels on node in CommonNodeLabelsManager, we should merge it to a single one for easier modify them and better readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268110#comment-14268110 ] Hadoop QA commented on YARN-3009: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690592/YARN-3009.20150108-1.patch against trunk revision fe8d2bd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6272//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6272//console This message is automatically generated. TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R Attachments: YARN-3009.20150108-1.patch If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
[ https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268162#comment-14268162 ] Rohith commented on YARN-3013: -- AYS#containerLaunchedOnNode and AYS#killOrphanContainerOnNode does not hold synchronized lock. Adding synchronization to both of the method should be fine. There will not be any interlocking fixing this. Findbugs warning aboutAbstractYarnScheduler.rmContext - Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
[ https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-3013: Assignee: Rohith Findbugs warning aboutAbstractYarnScheduler.rmContext - Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Rohith {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
[ https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3013: - Attachment: 0001-YARN-3013.patch Findbugs warning aboutAbstractYarnScheduler.rmContext - Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Rohith Attachments: 0001-YARN-3013.patch {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
[ https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268179#comment-14268179 ] Rohith commented on YARN-3013: -- [~zjshen] kindly review analysis and patch Findbugs warning aboutAbstractYarnScheduler.rmContext - Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Rohith Attachments: 0001-YARN-3013.patch {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
[ https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3013. --- Resolution: Duplicate Close it as the duplicate. Thanks for pointing it out. Findbugs warning aboutAbstractYarnScheduler.rmContext - Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Rohith Attachments: 0001-YARN-3013.patch {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
Zhijie Shen created YARN-3013: - Summary: Findbugs warning aboutAbstractYarnScheduler.rmContext Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268160#comment-14268160 ] Billie Rinaldi commented on YARN-3009: -- bq. the nested Json structure will be mistaken as a string Okay, would it be sufficient if we did not perform the comparison with the original String when the resulting Object is a List or Map? Or do you think a different approach would be better? TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R Attachments: YARN-3009.20150108-1.patch If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
[ https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268171#comment-14268171 ] Rohith commented on YARN-3013: -- There will not be any possibility of deadlock by adding synchronized key word to methods.It should be fine Findbugs warning aboutAbstractYarnScheduler.rmContext - Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Rohith {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268193#comment-14268193 ] Andrew Johnson commented on YARN-2893: -- I've also noticed that if multiple jobs are submitted at the same time and this error occurs, all the jobs will fail. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268188#comment-14268188 ] Rohith commented on YARN-3010: -- +1(non-binding) LGTM. Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3010.001.patch, YARN-3010.002.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2637: - Summary: maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. (was: maximum-am-resource-percent could be violated when resource of AM is minimumAllocation) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268183#comment-14268183 ] Andrew Johnson commented on YARN-2893: -- I am also encountering this same error. The failures are pretty sporadic and I've never been able to reproduce it. Resubmitting the failed job always works, however. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
[ https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268184#comment-14268184 ] Rohith commented on YARN-3013: -- Oho,I was not seen earlier!! This issue is duplicate of YARN-3010. Findbugs warning aboutAbstractYarnScheduler.rmContext - Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Rohith Attachments: 0001-YARN-3013.patch {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268359#comment-14268359 ] Andrew Johnson commented on YARN-2893: -- No, it's at least 95% Scalding jobs. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3014) Changing labels on a host should update all NM's labels on that host
[ https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3014: - Description: Admin can either specify labels on a host (by running {{yarn rmadmin -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn rmadmin -replaceLabelsOnNode host1:port,label1}}). If user has specified label=x on a NM (instead of host), and later set the label=y on host of the NM. NM's label should update to y as well. Target Version/s: 2.7.0 Changing labels on a host should update all NM's labels on that host Key: YARN-3014 URL: https://issues.apache.org/jira/browse/YARN-3014 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Admin can either specify labels on a host (by running {{yarn rmadmin -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn rmadmin -replaceLabelsOnNode host1:port,label1}}). If user has specified label=x on a NM (instead of host), and later set the label=y on host of the NM. NM's label should update to y as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268403#comment-14268403 ] Jason Lowe commented on YARN-2902: -- Thanks for the patch, Varun! I think the patch will prevent us from leaking the bookkeeping in resource trackers for resources in the downloading state, but it relies on the periodic retention checking and doesn't address the leaking of data on the disk. The localizer has probably created a partial *_tmp file/dir for the download that didn't complete, and we should be cleaning that up as well. As is we won't try to clean up any leaked DOWNLOADING resource until the retention process runs (on the order of tens of minutes), but we shouldn't need to wait around to reap resources that aren't really downloading. I haven't had time to work this all the way through, but I'm wondering if we're patching the symptoms rather than the root cause. The resource is lingering around in the DOWNLOADING state because a container was killed and we then forgot the corresponding localizer that was associated with the container. When the localizer later hearbeats in the NM tells the unknown localizer to DIE and that ultimately is what leads to a resource lingering around in the DOWNLOADING state. I think we should be properly cleaning up localizers corresponding to killed containers and sending appropriate events to the LocalizedResources. This will then cause the resources to transition out of the DOWNLOADING state to something appropriate, sending the proper events to any other containers that are pending on that resource. At that point we can also clean up any leaked _tmp files/dirs from the failed/killed localizer. Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268340#comment-14268340 ] Andrew Johnson commented on YARN-2893: -- [~jira.shegalov] This is always with Scalding jobs. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268356#comment-14268356 ] Gera Shegalov commented on YARN-2893: - Is there a significant fraction of other type of jobs on your clusters ? AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268391#comment-14268391 ] Hudson commented on YARN-3010: -- FAILURE: Integrated in Hadoop-trunk-Commit #6823 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6823/]) YARN-3010. Fixed findbugs warning in AbstractYarnScheduler. Contributed by Yi Liu (jianhe: rev e13a484a2be64fb781c5eca5ae7056cbe194ac5e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: 2.7.0 Attachments: YARN-3010.001.patch, YARN-3010.002.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268334#comment-14268334 ] Gera Shegalov commented on YARN-2893: - Hi [~ajsquared], what type of jobs are you seeing this with? I think almost all failures for us are Scalding/Cascading jobs, which made me think that it has to do with their multithreaded job submission code. AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3003) Provide API for client to retrieve label to node mapping
[ https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268382#comment-14268382 ] Wangda Tan commented on YARN-3003: -- Relationship between NMs and labels is more than a Bidirectional Map. We have hierarchy for NMs: now YARN supports launching multiple node managers on a same host, so we have host-listNM. And for node labels administration propose, admin can set labels on a host (affects all NMs on that host) OR set labels on a single NM (affects the NM only). I suggest to store nodes on label in NodeLabel class. For now we can store all related nodes, and in the future, we can extend it to support fetch running NMs associated to a given label. Thanks, Provide API for client to retrieve label to node mapping Key: YARN-3003 URL: https://issues.apache.org/jira/browse/YARN-3003 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Ted Yu Assignee: Varun Saxena Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set of labels associated with the node. Client (such as Slider) may be interested in label to node mapping - given label, return the nodes with this label. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3014) Changing labels on a host should update all NM's labels on that host
Wangda Tan created YARN-3014: Summary: Changing labels on a host should update all NM's labels on that host Key: YARN-3014 URL: https://issues.apache.org/jira/browse/YARN-3014 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3014) Changing labels on a host should update all NM's labels on that host
[ https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-3014: Assignee: Wangda Tan Changing labels on a host should update all NM's labels on that host Key: YARN-3014 URL: https://issues.apache.org/jira/browse/YARN-3014 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3014) Changing labels on a host should update all NM's labels on that host
[ https://issues.apache.org/jira/browse/YARN-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3014: - Issue Type: Sub-task (was: Bug) Parent: YARN-2492 Changing labels on a host should update all NM's labels on that host Key: YARN-3014 URL: https://issues.apache.org/jira/browse/YARN-3014 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan Admin can either specify labels on a host (by running {{yarn rmadmin -replaceLabelsOnNode host1,label1}}) OR on a single NM (by running {{yarn rmadmin -replaceLabelsOnNode host1:port,label1}}). If user has specified label=x on a NM (instead of host), and later set the label=y on host of the NM. NM's label should update to y as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268424#comment-14268424 ] Vinod Kumar Vavilapalli commented on YARN-2893: --- Is this in a secure cluster or a non-secure one? Trying to see if we can corner the type of tokens involved. Also, is it possible to patch your clusters locally to have some debug logs in the ResourceManager? AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream -- Key: YARN-2893 URL: https://issues.apache.org/jira/browse/YARN-2893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Gera Shegalov MapReduce jobs on our clusters experience sporadic failures due to corrupt tokens in the AM launch context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268670#comment-14268670 ] Chengbing Liu commented on YARN-2997: - Yes, a HashMap should be enough. I will upload a new one. Thanks. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection
[ https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268672#comment-14268672 ] Allen Wittenauer commented on YARN-2786: 'cluster' does not come alphabetically after 'node'. Create yarn cluster CLI to enable list node labels collection - Key: YARN-2786 URL: https://issues.apache.org/jira/browse/YARN-2786 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch, YARN-2786-20141104-1-full.patch, YARN-2786-20141104-1-without-yarn.cmd.patch, YARN-2786-20141104-2-full.patch, YARN-2786-20141104-2-without-yarn.cmd.patch, YARN-2786-20150107-1-full.patch, YARN-2786-20150107-1-without-yarn.cmd.patch With YARN-2778, we can list node labels on existing RM nodes. But it is not enough, we should be able to: 1) list node labels collection The command should start with yarn cluster ..., in the future, we can add more functionality to the yarnClusterCLI -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Attachment: YARN-2997.5.patch Update: use HashMap instead of LinkedHashMap. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, YARN-2997.5.patch, YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine fs operations in FileSystemRMStateStore and few fixes
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268687#comment-14268687 ] Hadoop QA commented on YARN-2996: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690673/YARN-2996.004.patch against trunk revision ef237bd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6274//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6274//console This message is automatically generated. Refine fs operations in FileSystemRMStateStore and few fixes Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch, YARN-2996.004.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268712#comment-14268712 ] Gera Shegalov commented on YARN-2934: - Yes it's related, but not exclusive to AM (try -Dmapreduce.map.env=JAVA_HOME=/no/jvm/here). It's just more severe with AM. cat is not the point. Getting the real diagnostics with something is, +1 for using tail. The pointer to the tracking page can be of little value for a busy cluster. The RMApp is likely to age out by the time the user gets to look at it, and there is no JHS entry because the AM crashed. It would be better to mention the nodeAddress as well, in addition to containerId to be used with 'yarn logs' Improve handling of container's stderr --- Key: YARN-2934 URL: https://issues.apache.org/jira/browse/YARN-2934 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Naganarasimha G R Priority: Critical Most YARN applications redirect stderr to some file. That's why when container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268714#comment-14268714 ] Hadoop QA commented on YARN-2997: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690687/YARN-2997.5.patch against trunk revision ef237bd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6276//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6276//console This message is automatically generated. NM keeps sending finished containers to RM until app is finished Key: YARN-2997 URL: https://issues.apache.org/jira/browse/YARN-2997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, YARN-2997.5.patch, YARN-2997.patch We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-644: - Issue Type: Sub-task (was: Bug) Parent: YARN-662 Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Priority: Minor I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268735#comment-14268735 ] Vinod Kumar Vavilapalli commented on YARN-3011: --- This is a part of YARN-662 - the one about doing sanity-checks. Linking.. NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications.
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.31.patch See what happens when maxActiveApplications and maxActiveApplicationsPerUser are removed altogether maximum-am-resource-percent could be respected for both LeafQueue/User when trying to activate applications. Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.31.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-662) [Umbrella] Enforce required parameters for all the protocols
[ https://issues.apache.org/jira/browse/YARN-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-662: - Summary: [Umbrella] Enforce required parameters for all the protocols (was: Enforce required parameters for all the protocols) [Umbrella] Enforce required parameters for all the protocols Key: YARN-662 URL: https://issues.apache.org/jira/browse/YARN-662 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth Assignee: Zhijie Shen All proto fields are marked as options. We need to mark some of them as requried, or enforce these server side. Server side is likely better since that's more flexible (Example deprecating a field type in favour of another - either of the two must be present) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3016) (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268736#comment-14268736 ] Sunil G commented on YARN-3016: --- HI [~wangda] Thanks for bringing up this. I have a doubt on this. Do you mean similar methods in CommonNodeLabelsManager and RMNodeLabelsManager ? (Refactoring) Merge internalAdd/Remove/ReplaceLabels to one method in CommonNodeLabelsManager - Key: YARN-3016 URL: https://issues.apache.org/jira/browse/YARN-3016 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Now we have separated but similar implementations for add/remove/replace labels on node in CommonNodeLabelsManager, we should merge it to a single one for easier modify them and better readability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3011: -- Issue Type: Sub-task (was: Bug) Parent: YARN-662 NM dies because of the failure of resource localization --- Key: YARN-3011 URL: https://issues.apache.org/jira/browse/YARN-3011 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.1 Reporter: Wang Hao Assignee: Varun Saxena NM dies because of IllegalArgumentException when localize resource. 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null } 2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null } 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:94) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-644: - Assignee: Varun Saxena Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Sub-task Reporter: Omkar Vinit Joshi Assignee: Varun Saxena Priority: Minor I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268752#comment-14268752 ] Vinod Kumar Vavilapalli commented on YARN-2571: --- Apologies for coming in really late on this. bq. startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) bq. app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. None of this is RM responsibility. Similar to creation of user directories on HDFS, this needs to be taken care of by administrators/external systems. bq. bq. attempt, container, app completion: remove service records with the matching persistence and ID This looks like application-level responsibility. Removing records on container-completion can and should be done by the individual apps' ApplicationMasters. Removing records on app completion should be done in an application-cleanup container (YARN-2261). Any use-case for application-attempt level records? RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch, YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, YARN-2571-008.patch, YARN-2571-009.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2978) ResourceManager crashes with NPE while getting queue info
[ https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267748#comment-14267748 ] Hudson commented on YARN-2978: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2017 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2017/]) YARN-2978. Fixed potential NPE while getting queue info. Contributed by Varun Saxena (jianhe: rev dd57c2047bfd21910acc38c98153eedf1db75169) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java ResourceManager crashes with NPE while getting queue info - Key: YARN-2978 URL: https://issues.apache.org/jira/browse/YARN-2978 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Jason Tufo Assignee: Varun Saxena Priority: Critical Labels: capacityscheduler, resourcemanager Fix For: 2.7.0 Attachments: YARN-2978.001.patch, YARN-2978.002.patch, YARN-2978.003.patch, YARN-2978.004.patch java.lang.NullPointerException at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625) at org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290) at org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2427) Add support for moving apps between queues in RM web services
[ https://issues.apache.org/jira/browse/YARN-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267747#comment-14267747 ] Hudson commented on YARN-2427: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2017 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2017/]) YARN-2427. Added the API of moving apps between queues in RM web services. Contributed by Varun Vasudev. (zjshen: rev 60103fca04dc713183e4ec9e12f961642e7d1001) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/JAXBContextResolver.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java Add support for moving apps between queues in RM web services - Key: YARN-2427 URL: https://issues.apache.org/jira/browse/YARN-2427 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-2427.0.patch, apache-yarn-2427.1.patch, apache-yarn-2427.2.patch, apache-yarn-2427.3.patch, apache-yarn-2427.4.patch Support for moving apps from one queue to another is now present in CapacityScheduler and FairScheduler. We should expose the functionality via RM web services as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated YARN-2807: --- Attachment: YARN-2807.3.patch Removed trailing whitespaces. Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated YARN-2934: Priority: Critical (was: Major) Improve handling of container's stderr --- Key: YARN-2934 URL: https://issues.apache.org/jira/browse/YARN-2934 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Naganarasimha G R Priority: Critical Most YARN applications redirect stderr to some file. That's why when container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2807) Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive
[ https://issues.apache.org/jira/browse/YARN-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267858#comment-14267858 ] Hadoop QA commented on YARN-2807: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690558/YARN-2807.3.patch against trunk revision 788ee35. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl org.apache.hadoop.ipc.TestCallQueueManager Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6268//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6268//console This message is automatically generated. Option --forceactive not works as described in usage of yarn rmadmin -transitionToActive Key: YARN-2807 URL: https://issues.apache.org/jira/browse/YARN-2807 Project: Hadoop YARN Issue Type: Sub-task Components: documentation, resourcemanager Reporter: Wangda Tan Assignee: Masatake Iwasaki Priority: Minor Attachments: YARN-2807.1.patch, YARN-2807.2.patch, YARN-2807.3.patch Currently the help message of yarn rmadmin -transitionToActive is: {code} transitionToActive: incorrect number of arguments Usage: HAAdmin [-transitionToActive serviceId [--forceactive]] {code} But the --forceactive not works as expected. When transition RM state with --forceactive: {code} yarn rmadmin -transitionToActive rm2 --forceactive Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@64c9f31e Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. {code} As shown above, we still cannot transitionToActive when automatic failover is enabled with --forceactive. The option can work is: {{--forcemanual}}, there's no place in usage describes this option. I think we should fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3012) Hadoop 2.6.0: Basic error “starting MRAppMaster” after installing
Dinh Hoang Mai created YARN-3012: Summary: Hadoop 2.6.0: Basic error “starting MRAppMaster” after installing Key: YARN-3012 URL: https://issues.apache.org/jira/browse/YARN-3012 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.6.0 Environment: CentOS 64bit Reporter: Dinh Hoang Mai Priority: Critical Fix For: 2.6.0 I have just started to work with Hadoop 2. After installing with basic configs (http://pl.postech.ac.kr/wiki/doku.php?id=maidinh:hadoop2_install), I always failed to run any examples. Has anyone seen this problem and please help me? This is the log 2015-01-08 01:52:01,599 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1420648881673_0004_01 2015-01-08 01:52:01,764 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.security.Groups.init(Groups.java:70) at org.apache.hadoop.security.Groups.init(Groups.java:66) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 7 more Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method) at org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49) at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:39) ... 12 more 2015-01-08 01:52:01,767 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3012) Hadoop 2.6.0: Basic error “starting MRAppMaster” after installing
[ https://issues.apache.org/jira/browse/YARN-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinh Hoang Mai updated YARN-3012: - Description: I have just started to work with Hadoop 2. After installing with basic configs, I always failed to run any examples. Has anyone seen this problem and please help me? This is the log 2015-01-08 01:52:01,599 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1420648881673_0004_01 2015-01-08 01:52:01,764 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.security.Groups.init(Groups.java:70) at org.apache.hadoop.security.Groups.init(Groups.java:66) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 7 more Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method) at org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49) at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:39) ... 12 more 2015-01-08 01:52:01,767 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1 was: I have just started to work with Hadoop 2. After installing with basic configs (http://pl.postech.ac.kr/wiki/doku.php?id=maidinh:hadoop2_install), I always failed to run any examples. Has anyone seen this problem and please help me? This is the log 2015-01-08 01:52:01,599 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1420648881673_0004_01 2015-01-08 01:52:01,764 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.security.Groups.init(Groups.java:70) at org.apache.hadoop.security.Groups.init(Groups.java:66) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 7 more Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method) at org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49) at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:39) ... 12 more 2015-01-08 01:52:01,767 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1 Hadoop 2.6.0: Basic error
[jira] [Commented] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267883#comment-14267883 ] Chris K Wensel commented on YARN-3009: -- First, it's a little odd to put a value in quotes that is part of a query string. but that's a reasonable workaround though non-obvious. Second, this then becomes a bug in Apache Tez DAGClientTimelineImpl since it does not quote values as it builds the query string. fwiw, using quotes to prevent interpreting 7 as a number instead of a string makes a lot of sense. but quoting 7ABDCEFG to make sure it isn't interpreted as a 7 is again non-intuitive. TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-160: --- Attachment: apache-yarn-160.3.patch Uploaded a new patch - apache-yarn-160.3.patch. 1. rebase to trunk 2. add a flag that allows users to turn off detection of underlying hardware. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267938#comment-14267938 ] Allen Wittenauer commented on YARN-160: --- bq. RAM-2*HADOOP_HEAPSIZE HADOOP_HEAPSIZE_MAX in trunk. HADOOP_HEAPSIZE was deprecated. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267962#comment-14267962 ] Mayank Bansal commented on YARN-2933: - Thanks [~wangda] for review. 1. Fixed, I should have used it. 2. I think getter and setter should be there. 3. Done 4. Done 5. Test is fixed 6. FInd bug is not due to this patch. Thanks, Mayank Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily
[ https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-2933: Attachment: YARN-2933-5.patch Updating patch. Thanks, Mayank Capacity Scheduler preemption policy should only consider capacity without labels temporarily - Key: YARN-2933 URL: https://issues.apache.org/jira/browse/YARN-2933 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Wangda Tan Assignee: Mayank Bansal Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, YARN-2933-4.patch, YARN-2933-5.patch Currently, we have capacity enforcement on each queue for each label in CapacityScheduler, but we don't have preemption policy to support that. YARN-2498 is targeting to support preemption respect node labels, but we have some gaps in code base, like queues/FiCaScheduler should be able to get usedResource/pendingResource, etc. by label. These items potentially need to refactor CS which we need spend some time carefully think about. For now, what immediately we can do is allow calculate ideal_allocation and preempt containers only for resources on nodes without labels, to avoid regression like: A cluster has some nodes with labels and some not, assume queueA isn't satisfied for resource without label, but for now, preemption policy may preempt resource from nodes with labels for queueA, that is not correct. Again, it is just a short-term enhancement, YARN-2498 will consider preemption respecting node-labels for Capacity Scheduler which is our final target. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2996) Refine some fs operations in FileSystemRMStateStore to improve performance
[ https://issues.apache.org/jira/browse/YARN-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267991#comment-14267991 ] Zhijie Shen commented on YARN-2996: --- Almost good to me. Just one nit: can we reuse getFileStatus() method here too? {code} FileStatus status; try { status = fs.getFileStatus(amrmTokenSecretManagerStateDataDir); assert status.isFile(); } catch (FileNotFoundException ex) { return; } {code} Refine some fs operations in FileSystemRMStateStore to improve performance -- Key: YARN-2996 URL: https://issues.apache.org/jira/browse/YARN-2996 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Yi Liu Assignee: Yi Liu Attachments: YARN-2996.001.patch, YARN-2996.002.patch, YARN-2996.003.patch In {{FileSystemRMStateStore}}, we can refine some fs operations to improve performance: *1.* There are several places invoke {{fs.exists}}, then {{fs.getFileStatus}}, we can merge them to save one RPC call {code} if (fs.exists(versionNodePath)) { FileStatus status = fs.getFileStatus(versionNodePath); {code} *2.* {code} protected void updateFile(Path outputPath, byte[] data) throws Exception { Path newPath = new Path(outputPath.getParent(), outputPath.getName() + .new); // use writeFile to make sure .new file is created atomically writeFile(newPath, data); replaceFile(newPath, outputPath); } {code} The {{updateFile}} is not good too, it write file to _output\_file_.tmp, then rename to _output\_file_.new, then rename it to _output\_file_, we can reduce one rename operation. Also there is one unnecessary import, we can remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)
[ https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268020#comment-14268020 ] Hudson commented on YARN-2230: -- FAILURE: Integrated in Hadoop-trunk-Commit #6821 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6821/]) YARN-2230. Fixed few configs description in yarn-default.xml. Contributed by Vijay Bhat (jianhe: rev fe8d2bd74175e7ad521bc310c41a367c0946d6ec) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code) - Key: YARN-2230 URL: https://issues.apache.org/jira/browse/YARN-2230 Project: Hadoop YARN Issue Type: Bug Components: client, documentation, scheduler Affects Versions: 2.4.0 Reporter: Adam Kawa Assignee: Vijay Bhat Priority: Minor Fix For: 2.7.0 Attachments: YARN-2230.001.patch, YARN-2230.002.patch When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java {code} if (resReq.getCapability().getVirtualCores() 0 || resReq.getCapability().getVirtualCores() maximumResource.getVirtualCores()) { throw new InvalidResourceRequestException(Invalid resource request + , requested virtual cores 0 + , or requested virtual cores max configured + , requestedVirtualCores= + resReq.getCapability().getVirtualCores() + , maxVirtualCores= + maximumResource.getVirtualCores()); } {code} According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request should be capped to the allocation limit. {code} property descriptionThe maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value./description nameyarn.scheduler.maximum-allocation-vcores/name value32/value /property {code} This means that: * Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not). This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g. {code} 2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_01 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores 0, or requested virtual cores max configured, requestedVirtualCores=32, maxVirtualCores=3 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420) . at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} * IMHO, such an exception should be forwarded to client. Otherwise, it is non obvious to discover why a job does not make any progress. The same looks to be related to memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3010) Fix recent findbug issue in AbstractYarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268032#comment-14268032 ] Jian He commented on YARN-3010: --- lgtm Fix recent findbug issue in AbstractYarnScheduler - Key: YARN-3010 URL: https://issues.apache.org/jira/browse/YARN-3010 Project: Hadoop YARN Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: YARN-3010.001.patch, YARN-3010.002.patch A new findbug issues reported recently in latest trunk: {quote} ISInconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext; locked 91% of time {quote} https://issues.apache.org/jira/browse/YARN-2996?focusedCommentId=14265760page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14265760 https://builds.apache.org/job/PreCommit-YARN-Build/6249//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3009) TimelineWebServices always parses primary and secondary filters as numbers if first char is a number
[ https://issues.apache.org/jira/browse/YARN-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3009: Attachment: YARN-3009.20150108-1.patch Attaching the patch TimelineWebServices always parses primary and secondary filters as numbers if first char is a number Key: YARN-3009 URL: https://issues.apache.org/jira/browse/YARN-3009 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Chris K Wensel Assignee: Naganarasimha G R Attachments: YARN-3009.20150108-1.patch If you pass a filter value that starts with a number (7CCA...), the filter value will be parsed into the Number '7' causing the filter to fail the search. Should be noted the actual value as stored via a PUT operation is properly parsed and stored as a String. This manifests as a very hard to identify issue with DAGClient in Apache Tez and naming dags/vertices with alphanumeric guid values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3000) YARN_PID_DIR should be visible in yarn-env.sh
[ https://issues.apache.org/jira/browse/YARN-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268050#comment-14268050 ] Rohith commented on YARN-3000: -- Thanks [~aw] for mentioning the lira id. I will close the issue as Not a problem. YARN_PID_DIR should be visible in yarn-env.sh - Key: YARN-3000 URL: https://issues.apache.org/jira/browse/YARN-3000 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 2.6.0 Reporter: Jeff Zhang Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3000.patch Currently I see YARN_PID_DIR only show in yarn-deamon.sh which is supposed not the place for user to set up enviroment variable. IMO, yarn-env.sh is the place for users to set up enviroment variable just like hadoop-env.sh, so it's better to put YARN_PID_DIR into yarn-env.sh. ( can put it into comment just like YARN_RESOURCEMANAGER_HEAPSIZE ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS
[ https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267936#comment-14267936 ] Varun Vasudev commented on YARN-160: The findbugs warnings are unrelated to the patch. nodemanagers should obtain cpu/memory values from underlying OS --- Key: YARN-160 URL: https://issues.apache.org/jira/browse/YARN-160 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Varun Vasudev Fix For: 2.7.0 Attachments: apache-yarn-160.0.patch, apache-yarn-160.1.patch, apache-yarn-160.2.patch, apache-yarn-160.3.patch As mentioned in YARN-2 *NM memory and CPU configs* Currently these values are coming from the config of the NM, we should be able to obtain those values from the OS (ie, in the case of Linux from /proc/meminfo /proc/cpuinfo). As this is highly OS dependent we should have an interface that obtains this information. In addition implementations of this interface should be able to specify a mem/cpu offset (amount of mem/cpu not to be avail as YARN resource), this would allow to reserve mem/cpu for the OS and other services outside of YARN containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3012) Hadoop 2.6.0: Basic error “starting MRAppMaster” after installing
[ https://issues.apache.org/jira/browse/YARN-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinh Hoang Mai updated YARN-3012: - Environment: Ubuntu 64bit (was: CentOS 64bit) Hadoop 2.6.0: Basic error “starting MRAppMaster” after installing - Key: YARN-3012 URL: https://issues.apache.org/jira/browse/YARN-3012 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.6.0 Environment: Ubuntu 64bit Reporter: Dinh Hoang Mai Priority: Critical Fix For: 2.6.0 I have just started to work with Hadoop 2. After installing with basic configs, I always failed to run any examples. Has anyone seen this problem and please help me? This is the log 2015-01-08 01:52:01,599 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1420648881673_0004_01 2015-01-08 01:52:01,764 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hadoop.security.Groups.init(Groups.java:70) at org.apache.hadoop.security.Groups.init(Groups.java:66) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:299) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1473) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 7 more Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native Method) at org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49) at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:39) ... 12 more 2015-01-08 01:52:01,767 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2637: -- Attachment: YARN-2637.30.patch userAMLimit logic included as well, now with a test :-) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: Craig Welch Priority: Critical Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.20.patch, YARN-2637.21.patch, YARN-2637.22.patch, YARN-2637.23.patch, YARN-2637.25.patch, YARN-2637.26.patch, YARN-2637.27.patch, YARN-2637.28.patch, YARN-2637.29.patch, YARN-2637.30.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() = getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info(Application + application.getApplicationId() + from user: + application.getUser() + activated in queue: + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM ( minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)