[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710760#comment-14710760 ] Hudson commented on YARN-4014: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2251 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2251/]) YARN-4014. Support user cli interface in for Application Priority. Contributed by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710758#comment-14710758 ] Hudson commented on YARN-4014: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2232 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2232/]) YARN-4014. Support user cli interface in for Application Priority. Contributed by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710765#comment-14710765 ] Hudson commented on YARN-4014: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #294 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/294/]) YARN-4014. Support user cli interface in for Application Priority. Contributed by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3874) Optimize and synchronize FS Reader and Writer Implementations
[ https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710850#comment-14710850 ] Varun Saxena commented on YARN-3874: [~sjlee0], [~djp], although not urgent but eventually this would need to go in as well. The patch would require rebasing now. Let me know once you have bandwidth to look into it, I will rebase it then. > Optimize and synchronize FS Reader and Writer Implementations > - > > Key: YARN-3874 > URL: https://issues.apache.org/jira/browse/YARN-3874 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3874-YARN-2928.01.patch, > YARN-3874-YARN-2928.02.patch, YARN-3874-YARN-2928.03.patch > > > Combine FS Reader and Writer Implementations and make them consistent with > each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-3250: Attachment: 0003-YARN-3250.patch > Support admin cli interface in for Application Priority > --- > > Key: YARN-3250 > URL: https://issues.apache.org/jira/browse/YARN-3250 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, > 0003-YARN-3250.patch > > > Current Application Priority Manager supports only configuration via file. > To support runtime configurations for admin cli and REST, a common management > interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710853#comment-14710853 ] Rohith Sharma K S commented on YARN-3250: - Updated the patch fixing review comments.. Kindly review the update patch.. > Support admin cli interface in for Application Priority > --- > > Key: YARN-3250 > URL: https://issues.apache.org/jira/browse/YARN-3250 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, > 0003-YARN-3250.patch > > > Current Application Priority Manager supports only configuration via file. > To support runtime configurations for admin cli and REST, a common management > interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3972) Work Preserving AM Restart for MapReduce
[ https://issues.apache.org/jira/browse/YARN-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710927#comment-14710927 ] Srikanth Sampath commented on YARN-3972: Upon discussing with [~vvasudev] exploring using Yarn Service Registry for Containers to locate the MR AppMaster. > Work Preserving AM Restart for MapReduce > > > Key: YARN-3972 > URL: https://issues.apache.org/jira/browse/YARN-3972 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Srikanth Sampath >Assignee: Raju Bairishetti > Attachments: WorkPreservingMRAppMaster.pdf > > > Providing a framework for work preserving AM is achieved in > [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489]. We would like > to take advantage of this for MapReduce(MR) applications. There are some > challenges which have been described in the attached document and few options > discussed. We solicit feedback from the community. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710942#comment-14710942 ] Rohith Sharma K S commented on YARN-3893: - Thanks [~bibinchundatt] for updating the patch. The patch mostly reasonable!! Some comments on the patch # Does {{isRMActive() }} check is required..? If transitionedToActive is success only then refreshAll will be executed!! IAC if you add also then check should be common for both i.e *_if_else* # In the Test, below code expecting transitionToActive to be failed? Is so, then it RM state shoud not be in Active state. Why RM will be in Active if adminService fails to transition? {code} +try { + rm.adminService.transitionToActive(requestInfo); +} catch (Exception e) { + assertTrue("Error when transitioning to Active mode".contains(e + .getMessage())); +} +assertEquals(HAServiceState.ACTIVE, rm.getRMContext().getHAServiceState()); {code} # Have you verified the test locally? I have doubt that test may be exitted in the middle since you are changing the scheduler configuration. Scheduler configuration is loaded during transitionedToStandby which fails to load and *System.exit* is called. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710946#comment-14710946 ] Srikanth Sampath commented on YARN-2571: What's the status of this patch - [~ste...@apache.org] I am considering using YARN registry for MR AppMaster in YARN-3972 and want to take some learnings from here. > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Labels: BB2015-05-TBR > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710955#comment-14710955 ] Rohith Sharma K S commented on YARN-3893: - To be more clear on the 3rd point, {{handleTransitionToStandBy}} call will exit if transitionToStandby fails. This transition may fail because during transition, active services are initialized. CS initialization loads the new capacity-schduler conf which result in wrong default queue capacity value result standby transition failure. 4. Instead of having separate class FatalEventCountDispatcher , can it be made inline? > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server
[ https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710984#comment-14710984 ] Hadoop QA commented on YARN-4044: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 49s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 4s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 59s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 50m 39s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 0s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestRM | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752166/0002-YARN-4044.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / af78767 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8903/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8903/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8903/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8903/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8903/console | This message was automatically generated. > Running applications information changes such as movequeue is not published > to TimeLine server > -- > > Key: YARN-4044 > URL: https://issues.apache.org/jira/browse/YARN-4044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch > > > SystemMetricsPublisher need to expose an appUpdated api to update any change > for a running application. > Events can be > - change of queue for a running application. > - change of application priority for a running application. > This ticket intends to handle both RM and timeline side changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4078) Unchecked typecast to AbstractYarnScheduler in AppInfo
[ https://issues.apache.org/jira/browse/YARN-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711024#comment-14711024 ] Naganarasimha G R commented on YARN-4078: - Yes [~rohithsharma] & [~varun_saxena], In most of the places its handled and except in these 2 places its type casted ( but in AppInfo is unguarded). But the point is why need to even have guarded check cant we expose both the methods ({{getPendingResourceRequestForAttempt}} & {{getApplicationAttempt}} ) in {{YarnScheduler}} ? > Unchecked typecast to AbstractYarnScheduler in AppInfo > -- > > Key: YARN-4078 > URL: https://issues.apache.org/jira/browse/YARN-4078 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > > Currently getPendingResourceRequestForAttempt is present in > {{AbstractYarnScheduler}}. > *But in AppInfo, we are calling this method by typecasting it to > AbstractYarnScheduler, which is incorrect.* > Because if a custom scheduler is to be added, it will implement > YarnScheduler, not AbstractYarnScheduler. > This method should be moved to YarnScheduler or it should have a guarded > check like in other places (RMAppAttemptBlock.getBlackListedNodes) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711036#comment-14711036 ] Varun Saxena commented on YARN-3893: Few additional comments : * Below exception block i.e. exception block after call to refreshAll, if {{YarnConfiguration.shouldRMFailFast(getConfig())}} is true, we merely post fatal event and do not return or throw an exception. This would lead to success audit log for transition to active being printed, which doesn't quite look correct. Because we are encountering some problem during call to transition. We should either return or throw a ServiceFailedException here as well. Although both are OK because RM would anyways be down later but I would prefer exception. {code} 324 } catch (Exception e) { 325 if (isRMActive() && YarnConfiguration.shouldRMFailFast(getConfig())) { 326 rmContext.getDispatcher().getEventHandler() 327 .handle(new RMFatalEvent(RMFatalEventType.ACTIVE_REFRESH_FAIL, e)); 328 }else{ 329 rm.handleTransitionToStandBy(); 330 throw new ServiceFailedException( 331 "Error on refreshAll during transistion to Active", e); 332 } 333 } 334 RMAuditLogger.logSuccess(user.getShortUserName(), "transitionToActive", 335 "RMHAProtocolService"); 336 } {code} * In TestRMHA, below import is unused. {code} import io.netty.channel.MessageSizeEstimator.Handle; {code} * A nit : There should be a space before else. {code} 328 }else{ 329 rm.handleTransitionToStandBy(); {code} * In the test added, assert is not required in the exception block after first call to transitionToActive * Maybe we can add an assert in test for service state being STANDBY after call to transitionToActive with incorrect capacity scheduler config and fail-fast being false. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711043#comment-14711043 ] Hadoop QA commented on YARN-2571: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12697782/YARN-2571-010.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eee0d45 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8905/console | This message was automatically generated. > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Labels: BB2015-05-TBR > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711050#comment-14711050 ] Shiwei Guo commented on YARN-3933: -- So I should better open a new issue instead? > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711051#comment-14711051 ] Shiwei Guo commented on YARN-3933: -- So I should better open a new issue instead? > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711056#comment-14711056 ] Lavkesh Lahngir commented on YARN-3933: --- Is it related to this ? https://issues.apache.org/jira/browse/YARN-4067 > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711083#comment-14711083 ] Hadoop QA commented on YARN-3250: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 10s | The applied patch generated 3 new checkstyle issues (total was 17, now 20). | | {color:red}-1{color} | whitespace | 0m 4s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 39s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 6m 59s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 53m 50s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 111m 56s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752185/0003-YARN-3250.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eee0d45 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/whitespace.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8904/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8904/console | This message was automatically generated. > Support admin cli interface in for Application Priority > --- > > Key: YARN-3250 > URL: https://issues.apache.org/jira/browse/YARN-3250 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, > 0003-YARN-3250.patch > > > Current Application Priority Manager supports only configuration via file. > To support runtime configurations for admin cli and REST, a common management > interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711132#comment-14711132 ] Shiwei Guo commented on YARN-3933: -- I think so, and so is [YRAN-4045|https://issues.apache.org/jira/browse/YARN-4045]. The negative value in root queue is casued by call to updateRootQueueMetrics on same containerId. In our cluster, it has the ability to run 13000+ container, but the WEB UI says that: - Containers Running: -26546 - Memory Used: -82.38 TB - VCores Used: -26451 Lucky that it haven't affect scheduling yet. > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server
[ https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711147#comment-14711147 ] Sunil G commented on YARN-4044: --- Test case failures are not related. > Running applications information changes such as movequeue is not published > to TimeLine server > -- > > Key: YARN-4044 > URL: https://issues.apache.org/jira/browse/YARN-4044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch > > > SystemMetricsPublisher need to expose an appUpdated api to update any change > for a running application. > Events can be > - change of queue for a running application. > - change of application priority for a running application. > This ticket intends to handle both RM and timeline side changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711164#comment-14711164 ] Varun Saxena commented on YARN-3893: Moreover, the fail fast configuration doesnt quite work as expected here. If capacity scheduler configuration is wrong, initialization will again fail and JVM will exit, which in essence is exactly same as the other case. We can handle fail fast as true case same way as earlier IMO. The reason it works in the test(JVM does not exit) is that you have passed CapacitySchedulerConfiguration object to MockRM. As CapacitySchedulerConfiguration is not instanceof YarnConfiguration, this will lead to a new YarnConfiguration object being created and passed to ResourceManager. When you are changing configuration in test and set queue capacity to 200, it is not reflecting in the Configuration object in ResourceManager class. That is why JVM does not exit when we transition to standby. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711171#comment-14711171 ] Varun Saxena commented on YARN-3893: Sorry I meant we can handle fail fast config being *false* case same way as we were doing in earlier patches. Otherwise checking for fail fast doesnt make any difference because both the code paths lead to same result. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4013) Publisher V2 should write the unmanaged AM flag and application priority
[ https://issues.apache.org/jira/browse/YARN-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G resolved YARN-4013. --- Resolution: Won't Fix Already handled in YARN-4058 > Publisher V2 should write the unmanaged AM flag and application priority > > > Key: YARN-4013 > URL: https://issues.apache.org/jira/browse/YARN-4013 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Sunil G > > Upon rebase the branch, I find we need to redo the similar work for V2 > publisher: > https://issues.apache.org/jira/browse/YARN-3543 > Also Application priority can be published along with this. YARN-3948 for > reference. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711181#comment-14711181 ] Junping Du commented on YARN-3011: -- I see. Thanks Varun for reminding on this. "all daemons it should be explicitly set to true so that daemons can crash instead of hanging around" is not wrong but could make system more fragile in case we miss to catch all possible recoverable or unrecoverable (but not global) exceptions like this JIRA case. We may need to think more about this. > NM dies because of the failure of resource localization > --- > > Key: YARN-3011 > URL: https://issues.apache.org/jira/browse/YARN-3011 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Wang Hao >Assignee: Varun Saxena > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-3011.001.patch, YARN-3011.002.patch, > YARN-3011.003.patch, YARN-3011.004.patch > > > NM dies because of IllegalArgumentException when localize resource. > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, > 1416997035456, FILE, null } > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, > 1419831474153, FILE, null } > 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > > at java.lang.Thread.run(Thread.java:745) > 2014-12-29 13:43:58,701 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user hadoop > 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Exiting, bbye.. > 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting > connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711196#comment-14711196 ] Varun Saxena commented on YARN-3011: [~djp], the only thing which we can do here is that we can read this value from configuration and set it to true in daemons if not configured. This way in production clusters if there is an exception which is leading to the daemon crashing frequently and we find that its not a very big issue(i.e daemon can still work normally), we can atleast set the configuration to false in config file. Right now, even that option is not there. Thoughts ? I can probably raise a JIRA for this and discussion(even if its not fixed) can carry on there. > NM dies because of the failure of resource localization > --- > > Key: YARN-3011 > URL: https://issues.apache.org/jira/browse/YARN-3011 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Wang Hao >Assignee: Varun Saxena > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-3011.001.patch, YARN-3011.002.patch, > YARN-3011.003.patch, YARN-3011.004.patch > > > NM dies because of IllegalArgumentException when localize resource. > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, > 1416997035456, FILE, null } > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, > 1419831474153, FILE, null } > 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > > at java.lang.Thread.run(Thread.java:745) > 2014-12-29 13:43:58,701 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user hadoop > 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Exiting, bbye.. > 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting > connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711201#comment-14711201 ] Rohith Sharma K S commented on YARN-3893: - There are 2 type of refresh can happen i.e. 1. yarn-site.xml refresh, 2. scheduler configurations refresh. Schduler configurations are reloaded for every service initialization which is by design. If any issue in the scheduler configuration, fail-fast configuraton behavior work as same for both true and false. Fail-fast configuration is useful when admin do mistake in configuring mistake in yarn-site.xml. With wrong configuration in yarn-site.xml, RM service can be up whereas with wrong Scheduler configuration , service can NOT be up at all. *On best effort basis for make service up*, handling exception for yarn-site.xml and scheduler configuration are different. BTW, making RM state StandBy would lead to filling up of the logs very soon because of elector continuous try to make active. Any configuration issue, better to exit the JVM and notify admin that RM is down so that admin can check the logs and identify it. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711274#comment-14711274 ] Varun Saxena commented on YARN-3893: Hmm...my point of view based on the fact that the service cannot be up if atleast one RM is not active. Standby RM is not going to serve anything anyways. Till configurations of this RM are not corrected, whether yarn-site or scheduler configurations, this RM anyways cant become active (refreshAll will always fail). And you can say there might be some silly mistake in scheduler configuration too. What we were doing before in the patch wont fill up the logs if configuration is ok on other RM. And if its not Ok on other RM, logs will fill up even even if refreshAll fails because of something other than scheduler config(and fail fast is false). fail fast by default is true, and if admin is making it false, he will know what to expect. But, you can say a RM shutting down is a far more alarming thing for an admin and scheduler configurations more important. I agree with that. Maybe we can make RM with wrong configuration down at all times. Because till he correct the config(whether yarn-site or scheduler config), this RM cant become active. Let us take opinion of couple of others as well on this. We can do whatever is the consensus. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711278#comment-14711278 ] Varun Saxena commented on YARN-3893: In previous patches, we were delaying reinitialization till attempting transition to active again and not attempting it immediately as we have done here. Any issues you expect with that ? > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711290#comment-14711290 ] Varun Saxena commented on YARN-3893: Saw your comments above. We cant do what we were doing earlier because as you say WebApp should be up even in standby. Let me think if something else can be done. > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons
[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4079: --- Summary: Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons (was: Retrospect on the decision of making yarn.dispatcher.exit-on-error as explicitly true in daemons) > Retrospect on the decision of making yarn.dispatcher.exit-on-error as true > explicitly in daemons > > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as explicitly true in daemons
[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4079: --- Summary: Retrospect on the decision of making yarn.dispatcher.exit-on-error as explicitly true in daemons (was: Retrospect on making yarn.dispatcher.exit-on-error as explicitly true in daemons) > Retrospect on the decision of making yarn.dispatcher.exit-on-error as > explicitly true in daemons > > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4079) Retrospect on making yarn.dispatcher.exit-on-error as explicitly true in daemons
Varun Saxena created YARN-4079: -- Summary: Retrospect on making yarn.dispatcher.exit-on-error as explicitly true in daemons Key: YARN-4079 URL: https://issues.apache.org/jira/browse/YARN-4079 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.1 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons
[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4079: --- Description: Currently in all daemons this config is explicitly set to true so that daemons can crash instead of hanging around. While this seems to be correct, as a recoverable exception should be caught and handled and NOT leaked through to AsyncDispatcher. And a non recoverable one should lead to a crash anyways. But this can make system more fragile in case we miss to catch all recoverable exceptions. Currently we do not even have an option of setting it to false in configuration, even if we would want. Probably we can read this value from configuration and set it to true in daemons if not configured. This way in production clusters if there is an exception which is leading to the daemon crashing frequently and we find that its unavoidable but not a very big issue(i.e daemon can still work normally for most part), we can atleast set the configuration to false in config file. > Retrospect on the decision of making yarn.dispatcher.exit-on-error as true > explicitly in daemons > > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > > Currently in all daemons this config is explicitly set to true so that > daemons can crash instead of hanging around. While this seems to be correct, > as a recoverable exception should be caught and handled and NOT leaked > through to AsyncDispatcher. And a non recoverable one should lead to a crash > anyways. > But this can make system more fragile in case we miss to catch all > recoverable exceptions. > Currently we do not even have an option of setting it to false in > configuration, even if we would want. > Probably we can read this value from configuration and set it to true in > daemons if not configured. > This way in production clusters if there is an exception which is leading to > the daemon crashing frequently and we find that its unavoidable but not a > very big issue(i.e daemon can still work normally for most part), we can > atleast set the configuration to false in config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons
[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711415#comment-14711415 ] Varun Saxena commented on YARN-4079: cc [~djp] > Retrospect on the decision of making yarn.dispatcher.exit-on-error as true > explicitly in daemons > > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > > Currently in all daemons this config is explicitly set to true so that > daemons can crash instead of hanging around. While this seems to be correct, > as a recoverable exception should be caught and handled and NOT leaked > through to AsyncDispatcher. And a non recoverable one should lead to a crash > anyways. > But this can make system more fragile in case we miss to catch all > recoverable exceptions. > Currently we do not even have an option of setting it to false in > configuration, even if we would want. > Probably we can read this value from configuration and set it to true in > daemons if not configured. > This way in production clusters if there is an exception which is leading to > the daemon crashing frequently and we find that its unavoidable but not a > very big issue(i.e daemon can still work normally for most part), we can > atleast set the configuration to false in config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons
[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711414#comment-14711414 ] Varun Saxena commented on YARN-4079: This JIRA has been raised based on the discussion on YARN-3011 (https://issues.apache.org/jira/browse/YARN-3011?focusedCommentId=1471&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1471). We can probably decide here if we want to handle it as above or not. > Retrospect on the decision of making yarn.dispatcher.exit-on-error as true > explicitly in daemons > > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > > Currently in all daemons this config is explicitly set to true so that > daemons can crash instead of hanging around. While this seems to be correct, > as a recoverable exception should be caught and handled and NOT leaked > through to AsyncDispatcher. And a non recoverable one should lead to a crash > anyways. > But this can make system more fragile in case we miss to catch all > recoverable exceptions. > Currently we do not even have an option of setting it to false in > configuration, even if we would want. > Probably we can read this value from configuration and set it to true in > daemons if not configured. > This way in production clusters if there is an exception which is leading to > the daemon crashing frequently and we find that its unavoidable but not a > very big issue(i.e daemon can still work normally for most part), we can > atleast set the configuration to false in config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons
[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711431#comment-14711431 ] Junping Du commented on YARN-4079: -- Thanks for filing this JIRA, [~varun_saxena]. bq. Probably we can read this value from configuration and set it to true in daemons if not configured. This way in production clusters if there is an exception which is leading to the daemon crashing frequently and we find that its unavoidable but not a very big issue(i.e daemon can still work normally for most part), we can atleast set the configuration to false in config file. I don't mean to simply make this configuration public and allow user to specify false to disable exit-on-failure when exception happen. This could make things worse if critical exceptions happen but NMs/RM are still running as normal. We should think more on this. > Retrospect on the decision of making yarn.dispatcher.exit-on-error as true > explicitly in daemons > > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > > Currently in all daemons this config is explicitly set to true so that > daemons can crash instead of hanging around. While this seems to be correct, > as a recoverable exception should be caught and handled and NOT leaked > through to AsyncDispatcher. And a non recoverable one should lead to a crash > anyways. > But this can make system more fragile in case we miss to catch all > recoverable exceptions. > Currently we do not even have an option of setting it to false in > configuration, even if we would want. > Probably we can read this value from configuration and set it to true in > daemons if not configured. > This way in production clusters if there is an exception which is leading to > the daemon crashing frequently and we find that its unavoidable but not a > very big issue(i.e daemon can still work normally for most part), we can > atleast set the configuration to false in config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons
[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711462#comment-14711462 ] Varun Saxena commented on YARN-4079: Hmm...I wasnt necessarily thinking of making it public. Just adding a way for it to be read from config so that it can be set to false if required(in rare scenarios) temporarily. But is there something else we can do ? Maybe we can add an exclusion list for which exceptions to be ignored. But the same exception might be a very critical bug in one area of code and not in other. So that may not be a viable alternative as well. > Retrospect on the decision of making yarn.dispatcher.exit-on-error as true > explicitly in daemons > > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > > Currently in all daemons this config is explicitly set to true so that > daemons can crash instead of hanging around. While this seems to be correct, > as a recoverable exception should be caught and handled and NOT leaked > through to AsyncDispatcher. And a non recoverable one should lead to a crash > anyways. > But this can make system more fragile in case we miss to catch all > recoverable exceptions. > Currently we do not even have an option of setting it to false in > configuration, even if we would want. > Probably we can read this value from configuration and set it to true in > daemons if not configured. > This way in production clusters if there is an exception which is leading to > the daemon crashing frequently and we find that its unavoidable but not a > very big issue(i.e daemon can still work normally for most part), we can > atleast set the configuration to false in config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons
[ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711464#comment-14711464 ] Varun Saxena commented on YARN-4079: Let us see what others think about how to handle this config. > Retrospect on the decision of making yarn.dispatcher.exit-on-error as true > explicitly in daemons > > > Key: YARN-4079 > URL: https://issues.apache.org/jira/browse/YARN-4079 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > > Currently in all daemons this config is explicitly set to true so that > daemons can crash instead of hanging around. While this seems to be correct, > as a recoverable exception should be caught and handled and NOT leaked > through to AsyncDispatcher. And a non recoverable one should lead to a crash > anyways. > But this can make system more fragile in case we miss to catch all > recoverable exceptions. > Currently we do not even have an option of setting it to false in > configuration, even if we would want. > Probably we can read this value from configuration and set it to true in > daemons if not configured. > This way in production clusters if there is an exception which is leading to > the daemon crashing frequently and we find that its unavoidable but not a > very big issue(i.e daemon can still work normally for most part), we can > atleast set the configuration to false in config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4080) Capacity planning for long running services on YARN
MENG DING created YARN-4080: --- Summary: Capacity planning for long running services on YARN Key: YARN-4080 URL: https://issues.apache.org/jira/browse/YARN-4080 Project: Hadoop YARN Issue Type: Improvement Components: api, resourcemanager Reporter: MENG DING YARN-1197 addresses the functionality of container resource resize. One major use case of this feature is for long running services managed by Slider to dynamically flex up and down resource allocation of individual components (e.g., HBase region server), based on application metrics/alerts obtained through third-party monitoring and policy engine. One key issue with increasing container resource at any point of time is that the additional resource needed by the application component may not be available *on the specific node*. In this case, we need to rely on preemption logic to reclaim the required resource back from other (preemptable) applications running on the same node. But this may not be possible today because: * preemption doesn't consider constraints of pending resource requests, such as hard locality requirements, user limits, etc (being addressed in YARN-2154 and possibly in YARN-3769?) * there may not be any preemptable container available due to the fact that no application is over its guaranteed capacity. What we need, ideally, is a way for YARN to support future capacity planning of long running services. At the minimum, we need to provide a way to let YARN know about the resource usage prediction/pattern of a long running service. And given this knowledge, YARN should be able to preempt resources from other applications to accommodate the resource needs of the long running service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3816: - Attachment: YARN-3816-YARN-2928-v1.patch Update PoC patch with following updates: - rebase patch according to latest updates on YARN-2928 (application table, reader API, etc.) - add reader api for read aggregation metrics - some refactor work Haven't include following updates (will do in next patch): - add configuration to enable/disable accumulation of aggregation metrics (AREA calculation) - address some important comments above - tests in TestHBaseTimelineStorage - other unit tests. > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711592#comment-14711592 ] Junping Du commented on YARN-3816: -- bq. When we are doing sum operation, what if the value after is sum outside the range of data type ? Do we assume it will be within limits? Especially aggregation values over a longer time period may well go beyond limits. That's a very good point, Varun! I think we can assume number calculations will keep within limits in most cases and a proper exception will get throw in case out of limit. What do you think? > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4080) Capacity planning for long running services on YARN
[ https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711622#comment-14711622 ] MENG DING commented on YARN-4080: - Not sure if the title accurately reflects the problem. If you think there is a better way to describe the problem, please suggest. For the use case presented in the description, one possible direction to consider is something like a dynamic host-based reservation (note, this is not the same as the current container reservation in YARN), for example: * when asking for resource requirement, one can specify the initial resource capability, and a reserved resource capability on whatever host that the container is launched on. For example, I can say I want 2GB of initial resource for a container, and once that container is launched, reserve up to 16GB of resource for the container on that host, as I expect the resource usage of the container will fluctuate over time, and will sometime peak at 16GB. * if this reserved resource is not fully utilized, it can still be allocated to other applications, but the scheduler will indicate that the allocated resource is revocable, such that no critical service should use this chunk of resource * when scheduler is allocating new resource, it should first consider resource that has not been reserved * preemption logic should also preempt these kind of revocable resource if needed The above is similar to the dynamic reservation feature being implemented in Mesos: https://issues.apache.org/jira/browse/MESOS-2018 I also took a look at YARN-1051 to see if the current reservation system in YARN could help with this situation, but to the best of my knowledge, it seems to mainly address applications with a future start time and a predictable deadline. Please correct me if I am wrong. Let me know if you have any thoughts, comments or ideas. > Capacity planning for long running services on YARN > --- > > Key: YARN-4080 > URL: https://issues.apache.org/jira/browse/YARN-4080 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, resourcemanager >Reporter: MENG DING > > YARN-1197 addresses the functionality of container resource resize. One major > use case of this feature is for long running services managed by Slider to > dynamically flex up and down resource allocation of individual components > (e.g., HBase region server), based on application metrics/alerts obtained > through third-party monitoring and policy engine. > One key issue with increasing container resource at any point of time is that > the additional resource needed by the application component may not be > available *on the specific node*. In this case, we need to rely on preemption > logic to reclaim the required resource back from other (preemptable) > applications running on the same node. But this may not be possible today > because: > * preemption doesn't consider constraints of pending resource requests, such > as hard locality requirements, user limits, etc (being addressed in YARN-2154 > and possibly in YARN-3769?) > * there may not be any preemptable container available due to the fact that > no application is over its guaranteed capacity. > What we need, ideally, is a way for YARN to support future capacity planning > of long running services. At the minimum, we need to provide a way to let > YARN know about the resource usage prediction/pattern of a long running > service. And given this knowledge, YARN should be able to preempt resources > from other applications to accommodate the resource needs of the long running > service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711631#comment-14711631 ] Steve Loughran commented on YARN-2571: -- All the registry code is there (slider uses it), except two bits of the RM side # create the base user path on app launch (in case the app needs it). This needs to be done by a process with the right permissions on ZK; it also makes sure that the user path is created with the perms to allow the RM/admin to delete it # purge entries on container/AM failure There was push-back from the YARN team on #2; not for the RM. I do still think #2 is needed. Irrespective of that, there is a main() entry point in the 2.7+ code which offers a CLI to create the reg; its just without docs or tests. Email me direct if you want to start using the code & I'll help you > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Labels: BB2015-05-TBR > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4080) Capacity planning for long running services on YARN
[ https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-4080: Description: YARN-1197 addresses the functionality of container resource resize. One major use case of this feature is for long running services managed by Slider to dynamically flex up and down resource allocation of individual components (e.g., HBase region server), based on application metrics/alerts obtained through third-party monitoring and policy engine. One key issue with increasing container resource at any point of time is that the additional resource needed by the application component may not be available *on the specific node*. In this case, we need to rely on preemption logic to reclaim the required resource back from other (preemptable) applications running on the same node. But this may not be possible today because: * preemption doesn't consider constraints of pending resource requests, such as hard locality requirements, user limits, etc (being addressed in YARN-2154 and possibly in YARN-3769?) * there may not be any preemptable container available due to the fact that no queue is over its guaranteed capacity. What we need, ideally, is a way for YARN to support future capacity planning of long running services. At the minimum, we need to provide a way to let YARN know about the resource usage prediction/pattern of a long running service. And given this knowledge, YARN should be able to preempt resources from other applications to accommodate the resource needs of the long running service. was: YARN-1197 addresses the functionality of container resource resize. One major use case of this feature is for long running services managed by Slider to dynamically flex up and down resource allocation of individual components (e.g., HBase region server), based on application metrics/alerts obtained through third-party monitoring and policy engine. One key issue with increasing container resource at any point of time is that the additional resource needed by the application component may not be available *on the specific node*. In this case, we need to rely on preemption logic to reclaim the required resource back from other (preemptable) applications running on the same node. But this may not be possible today because: * preemption doesn't consider constraints of pending resource requests, such as hard locality requirements, user limits, etc (being addressed in YARN-2154 and possibly in YARN-3769?) * there may not be any preemptable container available due to the fact that no application is over its guaranteed capacity. What we need, ideally, is a way for YARN to support future capacity planning of long running services. At the minimum, we need to provide a way to let YARN know about the resource usage prediction/pattern of a long running service. And given this knowledge, YARN should be able to preempt resources from other applications to accommodate the resource needs of the long running service. > Capacity planning for long running services on YARN > --- > > Key: YARN-4080 > URL: https://issues.apache.org/jira/browse/YARN-4080 > Project: Hadoop YARN > Issue Type: Improvement > Components: api, resourcemanager >Reporter: MENG DING > > YARN-1197 addresses the functionality of container resource resize. One major > use case of this feature is for long running services managed by Slider to > dynamically flex up and down resource allocation of individual components > (e.g., HBase region server), based on application metrics/alerts obtained > through third-party monitoring and policy engine. > One key issue with increasing container resource at any point of time is that > the additional resource needed by the application component may not be > available *on the specific node*. In this case, we need to rely on preemption > logic to reclaim the required resource back from other (preemptable) > applications running on the same node. But this may not be possible today > because: > * preemption doesn't consider constraints of pending resource requests, such > as hard locality requirements, user limits, etc (being addressed in YARN-2154 > and possibly in YARN-3769?) > * there may not be any preemptable container available due to the fact that > no queue is over its guaranteed capacity. > What we need, ideally, is a way for YARN to support future capacity planning > of long running services. At the minimum, we need to provide a way to let > YARN know about the resource usage prediction/pattern of a long running > service. And given this knowledge, YARN should be able to preempt resources > from other applications to accommodate the resource needs of the long running > service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4081) Add support for multiple resource types in the Resource class
Varun Vasudev created YARN-4081: --- Summary: Add support for multiple resource types in the Resource class Key: YARN-4081 URL: https://issues.apache.org/jira/browse/YARN-4081 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev For adding support for multiple resource types, we need to add support for this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4081) Add support for multiple resource types in the Resource class
[ https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4081: Attachment: YARN-4081-YARN-3926.001.patch Uploaded a patch with support for multiple resource types in the Resource class. > Add support for multiple resource types in the Resource class > - > > Key: YARN-4081 > URL: https://issues.apache.org/jira/browse/YARN-4081 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4081-YARN-3926.001.patch > > > For adding support for multiple resource types, we need to add support for > this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles
[ https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711651#comment-14711651 ] Varun Vasudev commented on YARN-3926: - I've created a YARN-3926 branch for this feature. > Extend the YARN resource model for easier resource-type management and > profiles > --- > > Key: YARN-3926 > URL: https://issues.apache.org/jira/browse/YARN-3926 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Proposal for modifying resource model and profiles.pdf > > > Currently, there are efforts to add support for various resource-types such > as disk(YARN-2139), network(YARN-2140), and HDFS bandwidth(YARN-2681). These > efforts all aim to add support for a new resource type and are fairly > involved efforts. In addition, once support is added, it becomes harder for > users to specify the resources they need. All existing jobs have to be > modified, or have to use the minimum allocation. > This ticket is a proposal to extend the YARN resource model to a more > flexible model which makes it easier to support additional resource-types. It > also considers the related aspect of “resource profiles” which allow users to > easily specify the various resources they need for any given container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class
[ https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711652#comment-14711652 ] Varun Vasudev commented on YARN-4081: - [~leftnoteasy], [~asuresh], [~jianhe] - can you please review? > Add support for multiple resource types in the Resource class > - > > Key: YARN-4081 > URL: https://issues.apache.org/jira/browse/YARN-4081 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4081-YARN-3926.001.patch > > > For adding support for multiple resource types, we need to add support for > this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-8.patch Rebase > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Assignee: Inigo Goiri >Priority: Minor > Labels: BB2015-05-TBR, containers, metrics, windows > Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, > YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, > YARN-3458-8.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2
[ https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711671#comment-14711671 ] Junping Du commented on YARN-4061: -- Thanks [~gtCarrera9] for working on a document for this. I have some high level comments on current design before moving to the details: 1. We should be very careful to use HDFS to cache incremental updates, i.e. incoming timeline entities. HDFS itself is not optimized for random writing performance especially with large scale of writers (assume each NM has a TimelineWriter). 2. Implementing a redo log based on HDFS is very complicated, and this should achieve the similar goal as WAL (Write Ahead Log) in HBase. Isn't it? If so, do we plan to borrow code/components from HBase on this? 3. I think making HDFS serve as backup storage make more sense. > [Fault tolerance] Fault tolerant writer for timeline v2 > --- > > Key: YARN-4061 > URL: https://issues.apache.org/jira/browse/YARN-4061 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: FaulttolerantwriterforTimelinev2.pdf > > > We need to build a timeline writer that can be resistant to backend storage > down time and timeline collector failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711707#comment-14711707 ] Hadoop QA commented on YARN-3458: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 0m 20s | Post-patch findbugs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common compilation is broken. | | {color:green}+1{color} | findbugs | 0m 20s | The patch does not introduce any new Findbugs (version ) warnings. | | {color:red}-1{color} | yarn tests | 0m 19s | Tests failed in hadoop-yarn-common. | | | | 39m 15s | | \\ \\ || Reason || Tests || | Failed build | hadoop-yarn-common | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752275/YARN-3458-8.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / eee0d45 | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8906/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8906/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8906/console | This message was automatically generated. > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Assignee: Inigo Goiri >Priority: Minor > Labels: BB2015-05-TBR, containers, metrics, windows > Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, > YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, > YARN-3458-8.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711711#comment-14711711 ] Vinod Kumar Vavilapalli commented on YARN-2884: --- [~jianhe] mentioned this offline and the configuration approach concerns me too. Stepping back, I think the current discovery of Scheduler by the apps is completely broken. Distributed Shell for e.g. works only because it is a java application and NM happens to put HADOOP_CONF_DIR in the classpath. Irrespective of this JIRA, we need to fix the scheduler discovery for the apps. The current way of depending on server configuration is unreliable in the face of rolling-upgrades. The specific solution in this JIRA further breaks rolling-upgrades and configuration updates. If and when, an admin forces client configuration changes, the config written by the Node will go out of sync. This overall makes the situation worse. I'd suggest that we start moving towards a better scheduler-discovery model. We have already done similar work with Timeline service (YARN-3039). We can implement part of that here - an environment based discovery - we can simply have an environment say YARN_SCHEDULER_ADDRESS for now set by the NodeManager into the AM-env, that is respected as the first level discovery mechanism. As we add more first-class discovery mechanisms, this env can take lesser precedence. This approach isn't too far from your current solution too, instead of pointing to a conf-dir env, you are pointing to a scheduler-address env directly. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711730#comment-14711730 ] Hadoop QA commented on YARN-3816: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 13s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 1s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 53s | The applied patch generated 8 additional warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 20s | The applied patch generated 13 new checkstyle issues (total was 38, now 51). | | {color:red}-1{color} | whitespace | 0m 16s | The patch has 20 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 46s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 9s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 1m 32s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 51m 27s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-api | | FindBugs | module:hadoop-yarn-server-timelineservice | | Failed unit tests | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorage | | | hadoop.yarn.server.timelineservice.storage.TestFileSystemTimelineWriterImpl | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752267/YARN-3816-YARN-2928-v1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 3c36922 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8908/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8908/console | This message was automatically generated. > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711738#comment-14711738 ] Benoit Sigoure commented on YARN-3238: -- What's the setting to tune down to avoid the 45min timeout? I'd like the code to fail fast. > Connection timeouts to nodemanagers are retried at multiple levels > -- > > Key: YARN-3238 > URL: https://issues.apache.org/jira/browse/YARN-3238 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-3238.001.patch > > > The IPC layer will retry connection timeouts automatically (see Client.java), > but we are also retrying them with YARN's RetryPolicy put in place when the > NM proxy is created. This causes a two-level retry mechanism where the IPC > layer has already retried quite a few times (45 by default) for each YARN > RetryPolicy error that is retried. The end result is that NM clients can > wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3717: Attachment: YARN-3717.20150825-1.patch Fixing following test case failures related to the patch TestNMClient.testNMClient TestNMClient.testNMClientNoCleanupOnStop TestYarnClient.testAMMRTokens Other Test failures are build issues > Improve RM node labels web UI > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, > YARN-3717.20150825-1.patch > > > 1> Add the default-node-Label expression for each queue in scheduler page. > 2> In Application/Appattempt page show the app configured node label > expression for AM and Job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711766#comment-14711766 ] Subru Krishnan commented on YARN-2884: -- [~vinodkv], thanks for your feedback. Let me first reiterate what I said to [~jlowe]'s similar observation, I agree not only that we should move towards a better scheduler discovery model but completely decouple apps from platform configs. The reason we didn't go down the path you have suggested is it puts a dependency on updating all the AMs (which we don't own unlike Timeline service) to use the new discovery mechanism. The current approach though non-ideal is agnostic to AM. To force the AMs to do just that, we should prevent access to the NM's config. If all of you are OK with the consequence, I can go ahead and make the change. I think it'll be better if we open a separate JIRA to address the decoupling of app & platform config with an initial sub-task to handle scheduler discovery through environment as you suggested? In that case, we'll update the patch to remove the changes in ContainerLaunch that overrides the HADOOP_CONF_DIR and AFAIK, [~jianhe] is OK with rest of the patch which he can commit asap. This will unblock us to use AMRMProxy with at least self contained apps like MapReduce, Spark which is our major workload. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class
[ https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711804#comment-14711804 ] Hadoop QA commented on YARN-4081: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 17s | Findbugs (version ) appears to be broken on YARN-3926. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 34s | The applied patch generated 89 new checkstyle issues (total was 10, now 99). | | {color:red}-1{color} | whitespace | 0m 20s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 32s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 55m 6s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 101m 44s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-api | | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752272/YARN-4081-YARN-3926.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-3926 / c95993c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8907/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8907/console | This message was automatically generated. > Add support for multiple resource types in the Resource class > - > > Key: YARN-4081 > URL: https://issues.apache.org/jira/browse/YARN-4081 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4081-YARN-3926.001.patch > > > For adding support for multiple resource types, we need to add support for > this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711863#comment-14711863 ] Subru Krishnan commented on YARN-2884: -- Just to add more context based on the offline discussions with @jian he, we can add a YARN_SCHEDULER_ADDRESS environment based scheduler discovery in the *AMRMClient* as an immediate first step. This will not cover all the AMs as AMRMClient is not used by custom AMs. Moreover apps can bring their own client JAR and the version can be older as long as it's backward compatible. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711941#comment-14711941 ] Hadoop QA commented on YARN-3717: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 50s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 8m 5s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 59s | Site still builds. | | {color:red}-1{color} | checkstyle | 1m 57s | The applied patch generated 3 new checkstyle issues (total was 16, now 18). | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 7m 20s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 59s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 11s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 53m 36s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 121m 41s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-common | | Failed unit tests | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752284/YARN-3717.20150825-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / eee0d45 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8909/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8909/console | This message was automatically generated. > Improve RM node labels web UI > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, > YARN-3717.20150825-1.patch > > > 1> Add the default-node-Label expression for each queue in scheduler page. > 2> In Application/Appattempt page show the app configured node label
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711987#comment-14711987 ] Jian He commented on YARN-2884: --- To make this move faster, I think we can have a separate jira to address the scheduler address discovery problem. At least, MR job can run without the change. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4082) Container shouldn't be killed when node's label updated.
Wangda Tan created YARN-4082: Summary: Container shouldn't be killed when node's label updated. Key: YARN-4082 URL: https://issues.apache.org/jira/browse/YARN-4082 Project: Hadoop YARN Issue Type: Sub-task Reporter: Wangda Tan Assignee: Wangda Tan >From YARN-2920, containers will be killed if partition of a node changed. >Instead of killing containers, we should update resource-usage-by-partition >properly when node's partition updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.
[ https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4082: - Component/s: (was: api) (was: client) (was: resourcemanager) capacityscheduler > Container shouldn't be killed when node's label updated. > > > Key: YARN-4082 > URL: https://issues.apache.org/jira/browse/YARN-4082 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > From YARN-2920, containers will be killed if partition of a node changed. > Instead of killing containers, we should update resource-usage-by-partition > properly when node's partition updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.
[ https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4082: - Component/s: (was: capacityscheduler) capacity scheduler > Container shouldn't be killed when node's label updated. > > > Key: YARN-4082 > URL: https://issues.apache.org/jira/browse/YARN-4082 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > From YARN-2920, containers will be killed if partition of a node changed. > Instead of killing containers, we should update resource-usage-by-partition > properly when node's partition updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.
[ https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4082: - Attachment: YARN-4082.1.patch Uploaded initial patch. > Container shouldn't be killed when node's label updated. > > > Key: YARN-4082 > URL: https://issues.apache.org/jira/browse/YARN-4082 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4082.1.patch > > > From YARN-2920, containers will be killed if partition of a node changed. > Instead of killing containers, we should update resource-usage-by-partition > properly when node's partition updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4083) Add a discovery mechanism for the scheduler addresss
[ https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan reassigned YARN-4083: Assignee: Subru Krishnan (was: Kishore Chaliparambil) > Add a discovery mechanism for the scheduler addresss > > > Key: YARN-4083 > URL: https://issues.apache.org/jira/browse/YARN-4083 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4083) Add a discovery mechanism for the scheduler addresss
Subru Krishnan created YARN-4083: Summary: Add a discovery mechanism for the scheduler addresss Key: YARN-4083 URL: https://issues.apache.org/jira/browse/YARN-4083 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Subru Krishnan Assignee: Kishore Chaliparambil We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4083) Add a discovery mechanism for the scheduler addresss
[ https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-4083: - Description: (was: We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs) > Add a discovery mechanism for the scheduler addresss > > > Key: YARN-4083 > URL: https://issues.apache.org/jira/browse/YARN-4083 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4083) Add a discovery mechanism for the scheduler addresss
[ https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-4083: - Description: Today many apps like Distributed Shell, REEF, etc rely on the fact that the HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler address. This JIRA proposes the addition of an explicit discovery mechanism for the scheduler address > Add a discovery mechanism for the scheduler addresss > > > Key: YARN-4083 > URL: https://issues.apache.org/jira/browse/YARN-4083 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > Today many apps like Distributed Shell, REEF, etc rely on the fact that the > HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler > address. This JIRA proposes the addition of an explicit discovery mechanism > for the scheduler address -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4083) Add a discovery mechanism for the scheduler addresss
[ https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-4083: - Issue Type: Improvement (was: Sub-task) Parent: (was: YARN-2877) > Add a discovery mechanism for the scheduler addresss > > > Key: YARN-4083 > URL: https://issues.apache.org/jira/browse/YARN-4083 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > Today many apps like Distributed Shell, REEF, etc rely on the fact that the > HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler > address. This JIRA proposes the addition of an explicit discovery mechanism > for the scheduler address -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler addresss
[ https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712206#comment-14712206 ] Subru Krishnan commented on YARN-4083: -- Based on the [discussion | https://issues.apache.org/jira/browse/YARN-2884?focusedCommentId=14711711] with [~jianhe], [~vinodkv], [~kishorch] and [~jlowe] in YARN-2884, will implement an initial scheduler address discovery mechanism based on an environment say YARN_SCHEDULER_ADDRESS. > Add a discovery mechanism for the scheduler addresss > > > Key: YARN-4083 > URL: https://issues.apache.org/jira/browse/YARN-4083 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > > Today many apps like Distributed Shell, REEF, etc rely on the fact that the > HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler > address. This JIRA proposes the addition of an explicit discovery mechanism > for the scheduler address -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712210#comment-14712210 ] Varun Saxena commented on YARN-3816: [~djp], thanks for the patch. Few comments and questions. # This pertains to what we are doing in YARN-4053. I see that we will be using column qualifier postfix/suffix to identify if metric is an aggregated one or not. In your case, this would mean an OR filter of the form metric=0 OR metric0=1 while applying metric filters on reader side. We were thinking of using similar scheme to identify a metric as long or double. If we use same scheme for long or double, we may end up with 4 ORs' for a single metric. Maybe we can use cell tags for aggregation. Or not support mixed data types. cc [~jrottinghuis]. # IIUC, TimelineMetric#toAggrgate flag would indicate if a metric is to be aggregated or not. Maybe in TimelineCollector#aggregateMetrics, we should do aggregation only if the flag is enabled. # In TimelineCollector#appendAggregatedMetricsToEntities any reason we are creating separate TimelineEntity objects for each metric ? Maybe create a single entity containing a set of metrics. # 3 new maps have been introduced in TimelineCollector and these are used as base to calculate aggregated value. What if the daemon crashes ? # In TimelineMetricCalculator some functions have duplicate if conditions for long. # In TimelineMetricCalculator#sum, to avoid negative values due to overflow, we can change conditions like below {code} if (n1 instanceof Integer){ return new Integer(n1.intValue() + n2.intValue()); } {code} to something like ? {code} if (n1 instanceof Integer){ if (Integer.MAX_VALUE - n1 - n2 < 0) { return new Long(n1.longValue() + n2.longValue()); } else { return new Integer(n1.intValue() + n2.intValue()); } } {code} We need not support upto BigInteger or BigDecimal but as you said above, we can throw exception for unsupported types. # In TimelineMetric#aggregateTo, maybe use getValues instead of getValuesJAXB ? # Also I was wondering if TimelineMetric#aggregateTo should be moved to some util class. TimelineMetric is part of object model and exposed to client. And IIUC aggregateTo wont be called by client. # What is EntityColumnPrefix#AGGREGATED_METRICS meant for ? > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712209#comment-14712209 ] Subru Krishnan commented on YARN-2884: -- Thanks [~jianhe], have created YARN-4083. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712220#comment-14712220 ] Varun Saxena commented on YARN-3816: BTW, while TimelineMetric#toAggregate flag is meant to indicate if a metric needs to be aggregated. But are we planning to use it to indicate that a metric is an aggregated metric as well ? If yes, we should probably set this flag for each metric processed in TimelineCollector#appendAggregatedMetricsToEntities. > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4082) Container shouldn't be killed when node's label updated.
[ https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712224#comment-14712224 ] Hadoop QA commented on YARN-4082: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 50s | The applied patch generated 9 new checkstyle issues (total was 299, now 308). | | {color:red}-1{color} | whitespace | 0m 5s | The patch has 23 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 30s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 53m 50s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 93m 46s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752332/YARN-4082.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8911/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8911/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8911/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8911/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8911/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8911/console | This message was automatically generated. > Container shouldn't be killed when node's label updated. > > > Key: YARN-4082 > URL: https://issues.apache.org/jira/browse/YARN-4082 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4082.1.patch > > > From YARN-2920, containers will be killed if partition of a node changed. > Instead of killing containers, we should update resource-usage-by-partition > properly when node's partition updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712230#comment-14712230 ] Li Lu commented on YARN-3816: - bq. Also I was wondering if TimelineMetric#aggregateTo should be moved to some util class. TimelineMetric is part of object model and exposed to client. And IIUC aggregateTo wont be called by client. Sorry but I think putting the aggregateTo method here is fine. I don't really like the idea of putting these static methods to a util class just because they look like utils. This is more of a subjective topic, but I hope our util methods to be general enough for the entire module. Aggregating metrics is not like reverting integers in their binary implementations, which is general enough for the whole module as a general "util". Here, aggregate metrics is clean enough to be a general operation to timeline metrics. I didn't get the part of the "called by client" discussion: our object model is used by both ourselves and our clients, so why "not called by clients" is a problem of our object model (the offline aggregation, for example, will also use this aggregation method)? > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712261#comment-14712261 ] Varun Saxena commented on YARN-3816: Hmm...Not being called by client is not a problem. I did not mean that. I was primarily thinking of these classes as data classes with getters and setters, and functional logic detatched from them. And this method is not using any member variables either. But yes this method wont be generic enough at a global level. There is point to that as well. Currently this method aggregateTo is not static. I think it should be made static even if its kept inside TimelineMetric as its not using any member variables. > [Aggregation] App-level Aggregation for YARN system metrics > --- > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Junping Du > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712269#comment-14712269 ] Robert Kanter commented on YARN-3528: - [~brahma], have you had a chance to look at the testcase failures? > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins
[ https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712271#comment-14712271 ] Robert Kanter commented on YARN-3528: - Oops, wrong person. [~brahmareddy] > Tests with 12345 as hard-coded port break jenkins > - > > Key: YARN-3528 > URL: https://issues.apache.org/jira/browse/YARN-3528 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 > Environment: ASF Jenkins >Reporter: Steve Loughran >Assignee: Brahma Reddy Battula >Priority: Blocker > Labels: test > Attachments: YARN-3528-002.patch, YARN-3528-003.patch, > YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528.patch > > > A lot of the YARN tests have hard-coded the port 12345 for their services to > come up on. > This makes it impossible to have scheduled or precommit tests to run > consistently on the ASF jenkins hosts. Instead the tests fail regularly and > appear to get ignored completely. > A quick grep of "12345" shows up many places in the test suite where this > practise has developed. > * All {{BaseContainerManagerTest}} subclasses > * {{TestNodeManagerShutdown}} > * {{TestContainerManager}} > + others > This needs to be addressed through portscanning and dynamic port allocation. > Please can someone do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712308#comment-14712308 ] Rohith Sharma K S commented on YARN-3893: - Hi [~varun_saxena], trying to understand your point of statement, my suggestion is to exit the RM if any configuration issue during refreshAll during {{AdminService#transitionToActive}}. As I given reason for making RM JVM down rather than keeping JVM alive in earlier [comment|https://issues.apache.org/jira/browse/YARN-3893?focusedCommentId=14711201&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14711201], do you have any concern for exiting the RM for configuration issues? > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishore Chaliparambil updated YARN-2884: Attachment: YARN-2884-V10.patch Uploaded YARN-2884-V10.patch. The changes in ContainerLaunch has been removed. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, > YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, > YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4065) container-executor error should include effective user id
[ https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712391#comment-14712391 ] Casey Brotherton commented on YARN-4065: Absolutely. Will start working on it > container-executor error should include effective user id > - > > Key: YARN-4065 > URL: https://issues.apache.org/jira/browse/YARN-4065 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Casey Brotherton >Priority: Trivial > > When container-executor fails to access it's config file, the following > message will be thrown: > {code} > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container executor initialization is : 24 > ExitCodeException exitCode=24: Invalid conf file provided : > /etc/hadoop/conf/container-executor.cfg > {code} > The real problem may be a change in the container-executor not running as set > uid root. > From: > https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html > {quote} > The container-executor program must be owned by root and have the permission > set ---sr-s---. > {quote} > The error message could be improved by printing out the effective user id > with the error message, and possibly the executable trying to access the > config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3854) Add localization support for docker images
[ https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712426#comment-14712426 ] Jun Gong commented on YARN-3854: I think what we need is a private registry. Push local image to private registry when(or before) submitting app, then NM could pull it form the private registry. BTW: We could build the private registry using HDFS as the storage backend(https://github.com/hex108/docker-registry-driver-hdfs), and it works well in our cluster. > Add localization support for docker images > -- > > Key: YARN-3854 > URL: https://issues.apache.org/jira/browse/YARN-3854 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > > We need the ability to localize images from HDFS and load them for use when > launching docker containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712446#comment-14712446 ] Hadoop QA commented on YARN-2884: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 22m 8s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 8m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 1s | The applied patch generated 1 new checkstyle issues (total was 237, now 237). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 51s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 7m 32s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 56m 17s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 124m 2s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752377/YARN-2884-V10.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8912/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8912/console | This message was automatically generated. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, > YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, > YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712477#comment-14712477 ] Sunil G commented on YARN-3250: --- Latest patch looks good to me. Could you please check the test failures whether it's related or not. Thank you.. > Support admin cli interface in for Application Priority > --- > > Key: YARN-3250 > URL: https://issues.apache.org/jira/browse/YARN-3250 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, > 0003-YARN-3250.patch > > > Current Application Priority Manager supports only configuration via file. > To support runtime configurations for admin cli and REST, a common management > interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
Ved Prakash Pandey created YARN-4084: Summary: Yarn should allow to skip hadoop-yarn-server-tests project from build.. Key: YARN-4084 URL: https://issues.apache.org/jira/browse/YARN-4084 Project: Hadoop YARN Issue Type: Bug Components: build Affects Versions: 2.7.1 Reporter: Ved Prakash Pandey For fast compilation one can try to skip the test code compilation by using {{-Dmaven.test.skip=true}}. But when yarn-project fails to compile when this option is used. This is because, it depends on hadoop-yarn-server-tests project. Below is the exception : {noformat} [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find attachment with classifier: tests in module project: org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this module from the module-set. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ved Prakash Pandey updated YARN-4084: - Priority: Minor (was: Major) > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But when yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ved Prakash Pandey updated YARN-4084: - Description: For fast compilation one can try to skip the test code compilation by using {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this option is used. This is because, it depends on hadoop-yarn-server-tests project. Below is the exception : {noformat} [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find attachment with classifier: tests in module project: org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this module from the module-set. {noformat} was: For fast compilation one can try to skip the test code compilation by using {{-Dmaven.test.skip=true}}. But when yarn-project fails to compile when this option is used. This is because, it depends on hadoop-yarn-server-tests project. Below is the exception : {noformat} [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find attachment with classifier: tests in module project: org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this module from the module-set. {noformat} > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ved Prakash Pandey updated YARN-4084: - Attachment: YARN-4084.patch > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > Attachments: YARN-4084.patch > > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..
[ https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712560#comment-14712560 ] Ved Prakash Pandey commented on YARN-4084: -- To fix this I have created a new profile called {{enable-yarn-server-test-module}} in hadoop-yarn-server pom. To include this module for one has to pass the {{-Penable-yarn-server-test-module}} during compilation > Yarn should allow to skip hadoop-yarn-server-tests project from build.. > --- > > Key: YARN-4084 > URL: https://issues.apache.org/jira/browse/YARN-4084 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 2.7.1 >Reporter: Ved Prakash Pandey >Priority: Minor > Attachments: YARN-4084.patch > > > For fast compilation one can try to skip the test code compilation by using > {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this > option is used. This is because, it depends on hadoop-yarn-server-tests > project. > Below is the exception : > {noformat} > [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find > attachment with classifier: tests in module project: > org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this > module from the module-set. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishore Chaliparambil updated YARN-2884: Attachment: YARN-2884-V11.patch Removed the ApplicationConstants.java file from the patch because it is not required. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, > YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, > YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, > YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)