[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710760#comment-14710760
 ] 

Hudson commented on YARN-4014:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2251 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2251/])
YARN-4014. Support user cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java


> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710758#comment-14710758
 ] 

Hudson commented on YARN-4014:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2232 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2232/])
YARN-4014. Support user cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto


> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710765#comment-14710765
 ] 

Hudson commented on YARN-4014:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #294 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/294/])
YARN-4014. Support user cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java


> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3874) Optimize and synchronize FS Reader and Writer Implementations

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710850#comment-14710850
 ] 

Varun Saxena commented on YARN-3874:


[~sjlee0], [~djp], although not urgent but eventually this would need to go in 
as well.
The patch would require rebasing now. Let me know once you have bandwidth to 
look into it, I will rebase it then.

> Optimize and synchronize FS Reader and Writer Implementations
> -
>
> Key: YARN-3874
> URL: https://issues.apache.org/jira/browse/YARN-3874
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3874-YARN-2928.01.patch, 
> YARN-3874-YARN-2928.02.patch, YARN-3874-YARN-2928.03.patch
>
>
> Combine FS Reader and Writer Implementations and make them consistent with 
> each other.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-25 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3250:

Attachment: 0003-YARN-3250.patch

> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710853#comment-14710853
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Updated the patch fixing review comments.. Kindly review the update patch.. 

> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3972) Work Preserving AM Restart for MapReduce

2015-08-25 Thread Srikanth Sampath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710927#comment-14710927
 ] 

Srikanth Sampath commented on YARN-3972:


Upon discussing with [~vvasudev] exploring using Yarn Service Registry for 
Containers to locate the MR AppMaster.

> Work Preserving AM Restart for MapReduce
> 
>
> Key: YARN-3972
> URL: https://issues.apache.org/jira/browse/YARN-3972
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Srikanth Sampath
>Assignee: Raju Bairishetti
> Attachments: WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in 
> [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like 
> to take advantage of this for MapReduce(MR) applications.  There are some 
> challenges which have been described in the attached document and few options 
> discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710942#comment-14710942
 ] 

Rohith Sharma K S commented on YARN-3893:
-

Thanks [~bibinchundatt] for updating the patch. The patch mostly reasonable!!
Some comments on the patch
# Does {{isRMActive() }} check is required..? If transitionedToActive is 
success only then refreshAll will be executed!! IAC if you add also then check 
should be common for both i.e *_if_else*
# In the Test, below code expecting transitionToActive to be failed? Is so, 
then it RM state shoud not be in Active state. Why RM will be in Active if 
adminService fails to transition?
{code}
+try {
+  rm.adminService.transitionToActive(requestInfo);
+} catch (Exception e) {
+  assertTrue("Error when transitioning to Active mode".contains(e
+  .getMessage()));
+}
+assertEquals(HAServiceState.ACTIVE, rm.getRMContext().getHAServiceState());
{code}
# Have you verified the test locally? I have doubt that test may be exitted in 
the middle since you are changing the scheduler configuration. Scheduler 
configuration is loaded during transitionedToStandby which fails to load and 
*System.exit* is called.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2571) RM to support YARN registry

2015-08-25 Thread Srikanth Sampath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710946#comment-14710946
 ] 

Srikanth Sampath commented on YARN-2571:


What's the status of this patch - [~ste...@apache.org]  I am considering using 
YARN registry for MR AppMaster in YARN-3972 and want to take some learnings 
from here.

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: BB2015-05-TBR
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710955#comment-14710955
 ] 

Rohith Sharma K S commented on YARN-3893:
-

To be more clear on the 3rd point, {{handleTransitionToStandBy}} call will exit 
if transitionToStandby fails. This transition may fail because during 
transition, active services are initialized. CS initialization loads the new 
capacity-schduler conf which result in wrong default queue capacity value 
result standby transition failure.
4. Instead of having separate class FatalEventCountDispatcher , can it be made 
inline?

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710984#comment-14710984
 ] 

Hadoop QA commented on YARN-4044:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 49s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 59s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  50m 39s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m  0s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestRM |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752166/0002-YARN-4044.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / af78767 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8903/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8903/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8903/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8903/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8903/console |


This message was automatically generated.

> Running applications information changes such as movequeue is not published 
> to TimeLine server
> --
>
> Key: YARN-4044
> URL: https://issues.apache.org/jira/browse/YARN-4044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch
>
>
> SystemMetricsPublisher need to expose an appUpdated api to update any change 
> for a running application.
> Events can be 
>   - change of queue for a running application.
> - change of application priority for a running application.
> This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4078) Unchecked typecast to AbstractYarnScheduler in AppInfo

2015-08-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711024#comment-14711024
 ] 

Naganarasimha G R commented on YARN-4078:
-

Yes  [~rohithsharma] & [~varun_saxena], In most of the places its handled and 
except in these 2 places its type casted ( but in AppInfo is unguarded). But 
the point is why need to even have guarded check cant we expose both the 
methods ({{getPendingResourceRequestForAttempt}}  &  {{getApplicationAttempt}} 
) in {{YarnScheduler}} ?

> Unchecked typecast to AbstractYarnScheduler in AppInfo
> --
>
> Key: YARN-4078
> URL: https://issues.apache.org/jira/browse/YARN-4078
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
>
> Currently getPendingResourceRequestForAttempt is present in 
> {{AbstractYarnScheduler}}.
> *But in AppInfo,  we are calling this method by typecasting it to 
> AbstractYarnScheduler, which is incorrect.*
> Because if a custom scheduler is to be added, it will implement 
> YarnScheduler, not AbstractYarnScheduler.
> This method should be moved to YarnScheduler or it should have a guarded 
> check like in other places (RMAppAttemptBlock.getBlackListedNodes) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711036#comment-14711036
 ] 

Varun Saxena commented on YARN-3893:


Few additional comments :

* Below exception block i.e. exception block after call to refreshAll, if 
{{YarnConfiguration.shouldRMFailFast(getConfig())}} is true, we merely post 
fatal event and do not return or throw an exception. This would lead to success 
audit log for transition to active being printed, which doesn't quite look 
correct. Because we are encountering some problem during call to transition. We 
should either return or throw a ServiceFailedException here as well. Although 
both are OK because RM would anyways be down later but I would prefer 
exception. 
{code} 
324 } catch (Exception e) {
325   if (isRMActive() && 
YarnConfiguration.shouldRMFailFast(getConfig())) {
326 rmContext.getDispatcher().getEventHandler()
327 .handle(new 
RMFatalEvent(RMFatalEventType.ACTIVE_REFRESH_FAIL, e));
328   }else{
329 rm.handleTransitionToStandBy();
330 throw new ServiceFailedException(
331 "Error on refreshAll during transistion to Active", e);
332   }
333 }
334 RMAuditLogger.logSuccess(user.getShortUserName(), 
"transitionToActive",
335 "RMHAProtocolService");
336   }
{code}

* In TestRMHA, below import is unused. 
{code}
import io.netty.channel.MessageSizeEstimator.Handle;
{code}

* A nit : There should be a space before else.
{code}
328   }else{
329 rm.handleTransitionToStandBy();
{code}

* In the test added, assert is not required in the exception block after first 
call to transitionToActive

* Maybe we can add an assert in test for service state being STANDBY after call 
to transitionToActive with incorrect capacity scheduler config and fail-fast 
being false.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2571) RM to support YARN registry

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711043#comment-14711043
 ] 

Hadoop QA commented on YARN-2571:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12697782/YARN-2571-010.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / eee0d45 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8905/console |


This message was automatically generated.

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: BB2015-05-TBR
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-25 Thread Shiwei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711050#comment-14711050
 ] 

Shiwei Guo commented on YARN-3933:
--

So I should better open a new issue instead?

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-25 Thread Shiwei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711051#comment-14711051
 ] 

Shiwei Guo commented on YARN-3933:
--

So I should better open a new issue instead?

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-25 Thread Lavkesh Lahngir (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711056#comment-14711056
 ] 

Lavkesh Lahngir commented on YARN-3933:
---

Is it related to this ?
https://issues.apache.org/jira/browse/YARN-4067

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711083#comment-14711083
 ] 

Hadoop QA commented on YARN-3250:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m  0s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 10s | The applied patch generated  3 
new checkstyle issues (total was 17, now 20). |
| {color:red}-1{color} | whitespace |   0m  4s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 39s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 59s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  53m 50s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 111m 56s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752185/0003-YARN-3250.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / eee0d45 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8904/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8904/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8904/console |


This message was automatically generated.

> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-25 Thread Shiwei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711132#comment-14711132
 ] 

Shiwei Guo commented on YARN-3933:
--

I think so, and so is 
[YRAN-4045|https://issues.apache.org/jira/browse/YARN-4045]. The negative value 
in root queue is casued by call to updateRootQueueMetrics on same containerId. 
In our cluster, it has the ability to run 13000+ container, but the WEB UI says 
that:

- Containers Running: -26546
- Memory Used: -82.38 TB
- VCores Used: -26451

Lucky that it haven't affect scheduling yet.

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-08-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711147#comment-14711147
 ] 

Sunil G commented on YARN-4044:
---

Test case failures are not related.

> Running applications information changes such as movequeue is not published 
> to TimeLine server
> --
>
> Key: YARN-4044
> URL: https://issues.apache.org/jira/browse/YARN-4044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch
>
>
> SystemMetricsPublisher need to expose an appUpdated api to update any change 
> for a running application.
> Events can be 
>   - change of queue for a running application.
> - change of application priority for a running application.
> This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711164#comment-14711164
 ] 

Varun Saxena commented on YARN-3893:


Moreover, the fail fast configuration doesnt quite work as expected here. If 
capacity scheduler configuration is wrong, initialization will again fail and 
JVM will exit, which in essence is exactly same as the other case. We can 
handle fail fast as true case same way as earlier IMO.

The reason it works in the test(JVM does not exit) is that you have passed 
CapacitySchedulerConfiguration object to MockRM. As 
CapacitySchedulerConfiguration is not instanceof YarnConfiguration, this will 
lead to a new YarnConfiguration object being created and passed to 
ResourceManager.
When you are changing configuration in test and set queue capacity to 200, it 
is not reflecting in the Configuration object in ResourceManager class. That is 
why JVM does not exit when we transition to standby.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711171#comment-14711171
 ] 

Varun Saxena commented on YARN-3893:


Sorry I meant we can handle fail fast config being *false* case same way as we 
were doing in earlier patches. Otherwise checking for fail fast doesnt make any 
difference because both the code paths lead to same result. 

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4013) Publisher V2 should write the unmanaged AM flag and application priority

2015-08-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G resolved YARN-4013.
---
Resolution: Won't Fix

Already handled in YARN-4058

> Publisher V2 should write the unmanaged AM flag and application priority
> 
>
> Key: YARN-4013
> URL: https://issues.apache.org/jira/browse/YARN-4013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Sunil G
>
> Upon rebase the branch, I find we need to redo the similar work for V2 
> publisher:
> https://issues.apache.org/jira/browse/YARN-3543
> Also Application priority can be published along with this. YARN-3948 for 
> reference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization

2015-08-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711181#comment-14711181
 ] 

Junping Du commented on YARN-3011:
--

I see. Thanks Varun for reminding on this. "all daemons it should be explicitly 
set to true so that daemons can crash instead of hanging around" is not wrong 
but could make system more fragile in case we miss to catch all possible 
recoverable or unrecoverable (but not global) exceptions like this JIRA case. 
We may need to think more about this.

> NM dies because of the failure of resource localization
> ---
>
> Key: YARN-3011
> URL: https://issues.apache.org/jira/browse/YARN-3011
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Wang Hao
>Assignee: Varun Saxena
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-3011.001.patch, YARN-3011.002.patch, 
> YARN-3011.003.patch, YARN-3011.004.patch
>
>
> NM dies because of IllegalArgumentException when localize resource.
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
>  1416997035456, FILE, null }
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
>  1419831474153, FILE, null }
> 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
> at org.apache.hadoop.fs.Path.(Path.java:135)
> at org.apache.hadoop.fs.Path.(Path.java:94)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
>   
> at java.lang.Thread.run(Thread.java:745)
> 2014-12-29 13:43:58,701 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> Initializing user hadoop
> 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
> connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711196#comment-14711196
 ] 

Varun Saxena commented on YARN-3011:


[~djp], the only thing which we can do here is that we can read this value from 
configuration and set it to true in daemons if not configured.
This way in production clusters if there is an exception which is leading to 
the daemon crashing frequently and we find that its not a very big issue(i.e 
daemon can still work normally), we can atleast set the configuration to false 
in config file.
Right now, even that option is not there. 
Thoughts ?

I can probably raise a JIRA for this and discussion(even if its not fixed) can 
carry on there.

> NM dies because of the failure of resource localization
> ---
>
> Key: YARN-3011
> URL: https://issues.apache.org/jira/browse/YARN-3011
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Wang Hao
>Assignee: Varun Saxena
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-3011.001.patch, YARN-3011.002.patch, 
> YARN-3011.003.patch, YARN-3011.004.patch
>
>
> NM dies because of IllegalArgumentException when localize resource.
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
>  1416997035456, FILE, null }
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
>  1419831474153, FILE, null }
> 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
> at org.apache.hadoop.fs.Path.(Path.java:135)
> at org.apache.hadoop.fs.Path.(Path.java:94)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
>   
> at java.lang.Thread.run(Thread.java:745)
> 2014-12-29 13:43:58,701 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> Initializing user hadoop
> 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
> connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711201#comment-14711201
 ] 

Rohith Sharma K S commented on YARN-3893:
-

There are 2 type of refresh can happen i.e. 1. yarn-site.xml refresh, 2. 
scheduler configurations refresh. Schduler configurations are reloaded for 
every service initialization which is by design. If any issue in the scheduler 
configuration, fail-fast configuraton behavior work as same for both true and 
false. Fail-fast configuration is useful when admin do mistake in configuring 
mistake in yarn-site.xml. With wrong configuration in yarn-site.xml, RM service 
can be  up whereas with wrong Scheduler configuration , service can NOT be up 
at all. *On best effort  basis for make service up*, handling exception for 
yarn-site.xml and scheduler configuration are different.

BTW, making RM state StandBy would lead to filling up of the logs very soon 
because of elector continuous try to make active. Any configuration issue, 
better to exit the JVM and notify admin that RM is down so that admin can check 
the logs and identify it.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711274#comment-14711274
 ] 

Varun Saxena commented on YARN-3893:


Hmm...my point of view based on the fact that the service cannot be up if 
atleast one RM is not active. Standby RM is not going to serve anything 
anyways. 
Till configurations of this RM are not corrected, whether yarn-site or 
scheduler configurations, this RM anyways cant become active (refreshAll will 
always fail). And you can say there might be some silly mistake in scheduler 
configuration too.

What we were doing before in the patch wont fill up the logs if configuration 
is ok on other RM. And if its not Ok on other RM, logs will fill up even even 
if refreshAll fails because of something other than scheduler config(and fail 
fast is false).
fail fast by default is true, and if admin is making it false, he will know 
what to expect. 
 
But, you can say a RM shutting down is a far more alarming thing for an admin 
and scheduler configurations more important. I agree with that. Maybe we can 
make RM with wrong configuration down at all times. Because till he correct the 
config(whether yarn-site or scheduler config), this RM cant become active.

Let us take opinion of couple of others as well on this. We can do whatever is 
the consensus.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711278#comment-14711278
 ] 

Varun Saxena commented on YARN-3893:


In previous patches, we were delaying reinitialization till attempting 
transition to active again and not attempting it immediately as we have done 
here. Any issues you expect with that ? 

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711290#comment-14711290
 ] 

Varun Saxena commented on YARN-3893:


Saw your comments above. We cant do what we were doing earlier because as you 
say WebApp should be up even in standby. Let me think if something else can be 
done.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons

2015-08-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4079:
---
Summary: Retrospect on the decision of making yarn.dispatcher.exit-on-error 
as true explicitly in daemons  (was: Retrospect on the decision of making 
yarn.dispatcher.exit-on-error as explicitly true in daemons)

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true 
> explicitly in daemons
> 
>
> Key: YARN-4079
> URL: https://issues.apache.org/jira/browse/YARN-4079
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as explicitly true in daemons

2015-08-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4079:
---
Summary: Retrospect on the decision of making yarn.dispatcher.exit-on-error 
as explicitly true in daemons  (was: Retrospect on making 
yarn.dispatcher.exit-on-error as explicitly true in daemons)

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as 
> explicitly true in daemons
> 
>
> Key: YARN-4079
> URL: https://issues.apache.org/jira/browse/YARN-4079
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4079) Retrospect on making yarn.dispatcher.exit-on-error as explicitly true in daemons

2015-08-25 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4079:
--

 Summary: Retrospect on making yarn.dispatcher.exit-on-error as 
explicitly true in daemons
 Key: YARN-4079
 URL: https://issues.apache.org/jira/browse/YARN-4079
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons

2015-08-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4079:
---
Description: 
Currently in all daemons this config is explicitly set to true so that daemons 
can crash instead of hanging around. While this seems to be correct, as a  
recoverable exception should be caught and handled and NOT leaked through to 
AsyncDispatcher. And a non recoverable one should lead to a crash anyways.

But this can make system more fragile in case we miss to catch all recoverable 
exceptions.

Currently we do not even have an option of setting it to false in 
configuration, even if we would want. 

Probably we can read this value from configuration and set it to true in 
daemons if not configured.
This way in production clusters if there is an exception which is leading to 
the daemon crashing frequently and we find that its unavoidable but not a very 
big issue(i.e daemon can still work normally for most part), we can atleast set 
the configuration to false in config file.

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true 
> explicitly in daemons
> 
>
> Key: YARN-4079
> URL: https://issues.apache.org/jira/browse/YARN-4079
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Currently in all daemons this config is explicitly set to true so that 
> daemons can crash instead of hanging around. While this seems to be correct, 
> as a  recoverable exception should be caught and handled and NOT leaked 
> through to AsyncDispatcher. And a non recoverable one should lead to a crash 
> anyways.
> But this can make system more fragile in case we miss to catch all 
> recoverable exceptions.
> Currently we do not even have an option of setting it to false in 
> configuration, even if we would want. 
> Probably we can read this value from configuration and set it to true in 
> daemons if not configured.
> This way in production clusters if there is an exception which is leading to 
> the daemon crashing frequently and we find that its unavoidable but not a 
> very big issue(i.e daemon can still work normally for most part), we can 
> atleast set the configuration to false in config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711415#comment-14711415
 ] 

Varun Saxena commented on YARN-4079:


cc [~djp]

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true 
> explicitly in daemons
> 
>
> Key: YARN-4079
> URL: https://issues.apache.org/jira/browse/YARN-4079
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Currently in all daemons this config is explicitly set to true so that 
> daemons can crash instead of hanging around. While this seems to be correct, 
> as a  recoverable exception should be caught and handled and NOT leaked 
> through to AsyncDispatcher. And a non recoverable one should lead to a crash 
> anyways.
> But this can make system more fragile in case we miss to catch all 
> recoverable exceptions.
> Currently we do not even have an option of setting it to false in 
> configuration, even if we would want. 
> Probably we can read this value from configuration and set it to true in 
> daemons if not configured.
> This way in production clusters if there is an exception which is leading to 
> the daemon crashing frequently and we find that its unavoidable but not a 
> very big issue(i.e daemon can still work normally for most part), we can 
> atleast set the configuration to false in config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711414#comment-14711414
 ] 

Varun Saxena commented on YARN-4079:


This JIRA has been raised based on the discussion on YARN-3011 
(https://issues.apache.org/jira/browse/YARN-3011?focusedCommentId=1471&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1471).
 We can probably decide here if we want to handle it as above or not.

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true 
> explicitly in daemons
> 
>
> Key: YARN-4079
> URL: https://issues.apache.org/jira/browse/YARN-4079
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Currently in all daemons this config is explicitly set to true so that 
> daemons can crash instead of hanging around. While this seems to be correct, 
> as a  recoverable exception should be caught and handled and NOT leaked 
> through to AsyncDispatcher. And a non recoverable one should lead to a crash 
> anyways.
> But this can make system more fragile in case we miss to catch all 
> recoverable exceptions.
> Currently we do not even have an option of setting it to false in 
> configuration, even if we would want. 
> Probably we can read this value from configuration and set it to true in 
> daemons if not configured.
> This way in production clusters if there is an exception which is leading to 
> the daemon crashing frequently and we find that its unavoidable but not a 
> very big issue(i.e daemon can still work normally for most part), we can 
> atleast set the configuration to false in config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons

2015-08-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711431#comment-14711431
 ] 

Junping Du commented on YARN-4079:
--

Thanks for filing this JIRA, [~varun_saxena].
bq. Probably we can read this value from configuration and set it to true in 
daemons if not configured. This way in production clusters if there is an 
exception which is leading to the daemon crashing frequently and we find that 
its unavoidable but not a very big issue(i.e daemon can still work normally for 
most part), we can atleast set the configuration to false in config file.
I don't mean to simply make this configuration public and allow user to specify 
false to disable exit-on-failure when exception happen. This could make things 
worse if critical exceptions happen but NMs/RM are still running as normal. We 
should think more on this.

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true 
> explicitly in daemons
> 
>
> Key: YARN-4079
> URL: https://issues.apache.org/jira/browse/YARN-4079
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Currently in all daemons this config is explicitly set to true so that 
> daemons can crash instead of hanging around. While this seems to be correct, 
> as a  recoverable exception should be caught and handled and NOT leaked 
> through to AsyncDispatcher. And a non recoverable one should lead to a crash 
> anyways.
> But this can make system more fragile in case we miss to catch all 
> recoverable exceptions.
> Currently we do not even have an option of setting it to false in 
> configuration, even if we would want. 
> Probably we can read this value from configuration and set it to true in 
> daemons if not configured.
> This way in production clusters if there is an exception which is leading to 
> the daemon crashing frequently and we find that its unavoidable but not a 
> very big issue(i.e daemon can still work normally for most part), we can 
> atleast set the configuration to false in config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711462#comment-14711462
 ] 

Varun Saxena commented on YARN-4079:


Hmm...I wasnt necessarily thinking of making it public. Just adding a way for 
it to be read from config so that it can be set to false if required(in rare 
scenarios) temporarily.
But is there something else we can do ?

Maybe we can add an exclusion list for which exceptions to be ignored. But the 
same exception might be a very critical bug in one area of code and not in 
other. So that may not be a viable alternative as well.

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true 
> explicitly in daemons
> 
>
> Key: YARN-4079
> URL: https://issues.apache.org/jira/browse/YARN-4079
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Currently in all daemons this config is explicitly set to true so that 
> daemons can crash instead of hanging around. While this seems to be correct, 
> as a  recoverable exception should be caught and handled and NOT leaked 
> through to AsyncDispatcher. And a non recoverable one should lead to a crash 
> anyways.
> But this can make system more fragile in case we miss to catch all 
> recoverable exceptions.
> Currently we do not even have an option of setting it to false in 
> configuration, even if we would want. 
> Probably we can read this value from configuration and set it to true in 
> daemons if not configured.
> This way in production clusters if there is an exception which is leading to 
> the daemon crashing frequently and we find that its unavoidable but not a 
> very big issue(i.e daemon can still work normally for most part), we can 
> atleast set the configuration to false in config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711464#comment-14711464
 ] 

Varun Saxena commented on YARN-4079:


Let us see what others think about how to handle this config.

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true 
> explicitly in daemons
> 
>
> Key: YARN-4079
> URL: https://issues.apache.org/jira/browse/YARN-4079
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Currently in all daemons this config is explicitly set to true so that 
> daemons can crash instead of hanging around. While this seems to be correct, 
> as a  recoverable exception should be caught and handled and NOT leaked 
> through to AsyncDispatcher. And a non recoverable one should lead to a crash 
> anyways.
> But this can make system more fragile in case we miss to catch all 
> recoverable exceptions.
> Currently we do not even have an option of setting it to false in 
> configuration, even if we would want. 
> Probably we can read this value from configuration and set it to true in 
> daemons if not configured.
> This way in production clusters if there is an exception which is leading to 
> the daemon crashing frequently and we find that its unavoidable but not a 
> very big issue(i.e daemon can still work normally for most part), we can 
> atleast set the configuration to false in config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4080) Capacity planning for long running services on YARN

2015-08-25 Thread MENG DING (JIRA)
MENG DING created YARN-4080:
---

 Summary: Capacity planning for long running services on YARN
 Key: YARN-4080
 URL: https://issues.apache.org/jira/browse/YARN-4080
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, resourcemanager
Reporter: MENG DING


YARN-1197 addresses the functionality of container resource resize. One major 
use case of this feature is for long running services managed by Slider to 
dynamically flex up and down resource allocation of individual components 
(e.g., HBase region server), based on application metrics/alerts obtained 
through third-party monitoring and policy engine. 

One key issue with increasing container resource at any point of time is that 
the additional resource needed by the application component may not be 
available *on the specific node*. In this case, we need to rely on preemption 
logic to reclaim the required resource back from other (preemptable) 
applications running on the same node. But this may not be possible today 
because:
* preemption doesn't consider constraints of pending resource requests, such as 
hard locality requirements, user limits, etc (being addressed in YARN-2154 and 
possibly in YARN-3769?) 
* there may not be any preemptable container available due to the fact that no 
application is over its guaranteed capacity.

What we need, ideally, is a way for YARN to support future capacity planning of 
long running services. At the minimum, we need to provide a way to let YARN 
know about the resource usage prediction/pattern of a long running service. And 
given this knowledge, YARN should be able to preempt resources from other 
applications to accommodate the resource needs of the long running service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-25 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3816:
-
Attachment: YARN-3816-YARN-2928-v1.patch

Update PoC patch with following updates:
- rebase patch according to latest updates on YARN-2928 (application table, 
reader API, etc.)
-  add reader api for read aggregation metrics
-  some refactor work
Haven't include following updates (will do in next patch):
- add configuration to enable/disable accumulation of aggregation metrics (AREA 
calculation)
- address some important comments above
- tests in TestHBaseTimelineStorage
- other unit tests.


> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711592#comment-14711592
 ] 

Junping Du commented on YARN-3816:
--

bq. When we are doing sum operation, what if the value after is sum outside the 
range of data type ? Do we assume it will be within limits? Especially 
aggregation values over a longer time period may well go beyond limits. 
That's a very good point, Varun! I think we can assume number calculations will 
keep within limits in most cases and a proper exception will get throw in case 
out of limit. What do you think?

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4080) Capacity planning for long running services on YARN

2015-08-25 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711622#comment-14711622
 ] 

MENG DING commented on YARN-4080:
-

Not sure if the title accurately reflects the problem. If you think there is a 
better way to describe the problem, please suggest.

For the use case presented in the description, one possible direction to 
consider is something like a dynamic host-based reservation (note, this is not 
the same as the current container reservation in YARN), for example:
* when asking for resource requirement, one can specify the initial resource 
capability, and a reserved resource capability on whatever host that the 
container is launched on. For example, I can say I want 2GB of initial resource 
for a container, and once that container is launched, reserve up to 16GB of 
resource for the container on that host, as I expect the resource usage of the 
container will fluctuate over time, and will sometime peak at 16GB.
* if this reserved resource is not fully utilized, it can still be allocated to 
other applications, but the scheduler will indicate that the allocated resource 
is revocable, such that no critical service should use this chunk of resource
* when scheduler is allocating new resource, it should first consider resource 
that has not been reserved
* preemption logic should also preempt these kind of revocable resource if 
needed

The above is similar to the dynamic reservation feature being implemented in 
Mesos: https://issues.apache.org/jira/browse/MESOS-2018

I also took a look at YARN-1051 to see if the current reservation system in 
YARN could help with this situation, but to the best of my knowledge, it seems 
to mainly address applications with a future start time and a predictable 
deadline. Please correct me if I am wrong.

Let me know if you have any thoughts, comments or ideas.

> Capacity planning for long running services on YARN
> ---
>
> Key: YARN-4080
> URL: https://issues.apache.org/jira/browse/YARN-4080
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, resourcemanager
>Reporter: MENG DING
>
> YARN-1197 addresses the functionality of container resource resize. One major 
> use case of this feature is for long running services managed by Slider to 
> dynamically flex up and down resource allocation of individual components 
> (e.g., HBase region server), based on application metrics/alerts obtained 
> through third-party monitoring and policy engine. 
> One key issue with increasing container resource at any point of time is that 
> the additional resource needed by the application component may not be 
> available *on the specific node*. In this case, we need to rely on preemption 
> logic to reclaim the required resource back from other (preemptable) 
> applications running on the same node. But this may not be possible today 
> because:
> * preemption doesn't consider constraints of pending resource requests, such 
> as hard locality requirements, user limits, etc (being addressed in YARN-2154 
> and possibly in YARN-3769?) 
> * there may not be any preemptable container available due to the fact that 
> no application is over its guaranteed capacity.
> What we need, ideally, is a way for YARN to support future capacity planning 
> of long running services. At the minimum, we need to provide a way to let 
> YARN know about the resource usage prediction/pattern of a long running 
> service. And given this knowledge, YARN should be able to preempt resources 
> from other applications to accommodate the resource needs of the long running 
> service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2571) RM to support YARN registry

2015-08-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711631#comment-14711631
 ] 

Steve Loughran commented on YARN-2571:
--

All the registry code is there (slider uses it), except two bits of the RM side
# create the base user path on app launch (in case the app needs it). This 
needs to be done by a process with the right permissions on ZK; it also makes 
sure that the user path is created with the perms to allow the RM/admin to 
delete it
# purge entries on container/AM failure

There was push-back from the YARN team on #2; not for the RM. I do still think 
#2 is needed. Irrespective of that, there is a main() entry point in the 2.7+ 
code which offers a CLI to create the reg; its just without docs or tests. 
Email me direct if you want to start using the code & I'll help you

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: BB2015-05-TBR
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4080) Capacity planning for long running services on YARN

2015-08-25 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-4080:

Description: 
YARN-1197 addresses the functionality of container resource resize. One major 
use case of this feature is for long running services managed by Slider to 
dynamically flex up and down resource allocation of individual components 
(e.g., HBase region server), based on application metrics/alerts obtained 
through third-party monitoring and policy engine. 

One key issue with increasing container resource at any point of time is that 
the additional resource needed by the application component may not be 
available *on the specific node*. In this case, we need to rely on preemption 
logic to reclaim the required resource back from other (preemptable) 
applications running on the same node. But this may not be possible today 
because:
* preemption doesn't consider constraints of pending resource requests, such as 
hard locality requirements, user limits, etc (being addressed in YARN-2154 and 
possibly in YARN-3769?) 
* there may not be any preemptable container available due to the fact that no 
queue is over its guaranteed capacity.

What we need, ideally, is a way for YARN to support future capacity planning of 
long running services. At the minimum, we need to provide a way to let YARN 
know about the resource usage prediction/pattern of a long running service. And 
given this knowledge, YARN should be able to preempt resources from other 
applications to accommodate the resource needs of the long running service.

  was:
YARN-1197 addresses the functionality of container resource resize. One major 
use case of this feature is for long running services managed by Slider to 
dynamically flex up and down resource allocation of individual components 
(e.g., HBase region server), based on application metrics/alerts obtained 
through third-party monitoring and policy engine. 

One key issue with increasing container resource at any point of time is that 
the additional resource needed by the application component may not be 
available *on the specific node*. In this case, we need to rely on preemption 
logic to reclaim the required resource back from other (preemptable) 
applications running on the same node. But this may not be possible today 
because:
* preemption doesn't consider constraints of pending resource requests, such as 
hard locality requirements, user limits, etc (being addressed in YARN-2154 and 
possibly in YARN-3769?) 
* there may not be any preemptable container available due to the fact that no 
application is over its guaranteed capacity.

What we need, ideally, is a way for YARN to support future capacity planning of 
long running services. At the minimum, we need to provide a way to let YARN 
know about the resource usage prediction/pattern of a long running service. And 
given this knowledge, YARN should be able to preempt resources from other 
applications to accommodate the resource needs of the long running service.


> Capacity planning for long running services on YARN
> ---
>
> Key: YARN-4080
> URL: https://issues.apache.org/jira/browse/YARN-4080
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, resourcemanager
>Reporter: MENG DING
>
> YARN-1197 addresses the functionality of container resource resize. One major 
> use case of this feature is for long running services managed by Slider to 
> dynamically flex up and down resource allocation of individual components 
> (e.g., HBase region server), based on application metrics/alerts obtained 
> through third-party monitoring and policy engine. 
> One key issue with increasing container resource at any point of time is that 
> the additional resource needed by the application component may not be 
> available *on the specific node*. In this case, we need to rely on preemption 
> logic to reclaim the required resource back from other (preemptable) 
> applications running on the same node. But this may not be possible today 
> because:
> * preemption doesn't consider constraints of pending resource requests, such 
> as hard locality requirements, user limits, etc (being addressed in YARN-2154 
> and possibly in YARN-3769?) 
> * there may not be any preemptable container available due to the fact that 
> no queue is over its guaranteed capacity.
> What we need, ideally, is a way for YARN to support future capacity planning 
> of long running services. At the minimum, we need to provide a way to let 
> YARN know about the resource usage prediction/pattern of a long running 
> service. And given this knowledge, YARN should be able to preempt resources 
> from other applications to accommodate the resource needs of the long running 
> service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-25 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-4081:
---

 Summary: Add support for multiple resource types in the Resource 
class
 Key: YARN-4081
 URL: https://issues.apache.org/jira/browse/YARN-4081
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev


For adding support for multiple resource types, we need to add support for this 
in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-25 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4081:

Attachment: YARN-4081-YARN-3926.001.patch

Uploaded a patch with support for multiple resource types in the Resource class.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3926) Extend the YARN resource model for easier resource-type management and profiles

2015-08-25 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711651#comment-14711651
 ] 

Varun Vasudev commented on YARN-3926:
-

I've created a YARN-3926 branch for this feature.

> Extend the YARN resource model for easier resource-type management and 
> profiles
> ---
>
> Key: YARN-3926
> URL: https://issues.apache.org/jira/browse/YARN-3926
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Proposal for modifying resource model and profiles.pdf
>
>
> Currently, there are efforts to add support for various resource-types such 
> as disk(YARN-2139), network(YARN-2140), and  HDFS bandwidth(YARN-2681). These 
> efforts all aim to add support for a new resource type and are fairly 
> involved efforts. In addition, once support is added, it becomes harder for 
> users to specify the resources they need. All existing jobs have to be 
> modified, or have to use the minimum allocation.
> This ticket is a proposal to extend the YARN resource model to a more 
> flexible model which makes it easier to support additional resource-types. It 
> also considers the related aspect of “resource profiles” which allow users to 
> easily specify the various resources they need for any given container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-25 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711652#comment-14711652
 ] 

Varun Vasudev commented on YARN-4081:
-

[~leftnoteasy], [~asuresh], [~jianhe] - can you please review?

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows

2015-08-25 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Attachment: YARN-3458-8.patch

Rebase

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

2015-08-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711671#comment-14711671
 ] 

Junping Du commented on YARN-4061:
--

Thanks [~gtCarrera9] for working on a document for this. I have some high level 
comments on current design before moving to the details:
1. We should be very careful to use HDFS to cache incremental updates, i.e. 
incoming timeline entities. HDFS itself is not optimized for random writing 
performance especially with large scale of writers (assume each NM has a 
TimelineWriter). 
2. Implementing a redo log based on HDFS is very complicated, and this should 
achieve the similar goal as WAL (Write Ahead Log) in HBase. Isn't it? If so, do 
we plan to borrow code/components from HBase on this?
3. I think making HDFS serve as backup storage make more sense.

> [Fault tolerance] Fault tolerant writer for timeline v2
> ---
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711707#comment-14711707
 ] 

Hadoop QA commented on YARN-3458:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  2s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  9s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 54s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   0m 20s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common compilation is broken. |
| {color:green}+1{color} | findbugs |   0m 20s | The patch does not introduce 
any new Findbugs (version ) warnings. |
| {color:red}-1{color} | yarn tests |   0m 19s | Tests failed in 
hadoop-yarn-common. |
| | |  39m 15s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-yarn-common |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752275/YARN-3458-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / eee0d45 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8906/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8906/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8906/console |


This message was automatically generated.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711711#comment-14711711
 ] 

Vinod Kumar Vavilapalli commented on YARN-2884:
---

[~jianhe] mentioned this offline and the configuration approach concerns me too.

Stepping back, I think the current discovery of Scheduler by the apps is 
completely broken. Distributed Shell for e.g. works only because it is a java 
application and NM happens to put HADOOP_CONF_DIR in the classpath. 
Irrespective of this JIRA, we need to fix the scheduler discovery for the apps. 
The current way of depending on server configuration is unreliable in the face 
of rolling-upgrades.

The specific solution in this JIRA further breaks rolling-upgrades and 
configuration updates. If and when, an admin forces client configuration 
changes, the config written by the Node will go out of sync. This overall makes 
the situation worse.

I'd suggest that we start moving towards a better scheduler-discovery model. We 
have already done similar work with Timeline service (YARN-3039). We can 
implement part of that here - an environment based discovery - we can simply 
have an environment say YARN_SCHEDULER_ADDRESS for now set by the NodeManager 
into the AM-env, that is respected as the first level discovery mechanism. As 
we add more first-class discovery mechanisms, this env can take lesser 
precedence. This approach isn't too far from your current solution too, instead 
of pointing to a conf-dir env, you are pointing to a scheduler-address env 
directly.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
> YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, 
> YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711730#comment-14711730
 ] 

Hadoop QA commented on YARN-3816:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 13s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  1s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 53s | The applied patch generated  8  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 20s | The applied patch generated  
13 new checkstyle issues (total was 38, now 51). |
| {color:red}-1{color} | whitespace |   0m 16s | The patch has 20  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 46s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m  9s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |   1m 32s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | |  51m 27s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-api |
| FindBugs | module:hadoop-yarn-server-timelineservice |
| Failed unit tests | 
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorage |
|   | 
hadoop.yarn.server.timelineservice.storage.TestFileSystemTimelineWriterImpl |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752267/YARN-3816-YARN-2928-v1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 3c36922 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/diffJavadocWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8908/console |


This message was automatically generated.

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow

[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels

2015-08-25 Thread Benoit Sigoure (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711738#comment-14711738
 ] 

Benoit Sigoure commented on YARN-3238:
--

What's the setting to tune down to avoid the 45min timeout? I'd like the code 
to fail fast.

> Connection timeouts to nodemanagers are retried at multiple levels
> --
>
> Key: YARN-3238
> URL: https://issues.apache.org/jira/browse/YARN-3238
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-3238.001.patch
>
>
> The IPC layer will retry connection timeouts automatically (see Client.java), 
> but we are also retrying them with YARN's RetryPolicy put in place when the 
> NM proxy is created.  This causes a two-level retry mechanism where the IPC 
> layer has already retried quite a few times (45 by default) for each YARN 
> RetryPolicy error that is retried.  The end result is that NM clients can 
> wait a very, very long time for the connection to finally fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3717) Improve RM node labels web UI

2015-08-25 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: YARN-3717.20150825-1.patch

Fixing following test case failures related to the patch
TestNMClient.testNMClient
TestNMClient.testNMClientNoCleanupOnStop
TestYarnClient.testAMMRTokens
Other Test failures are build issues

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711766#comment-14711766
 ] 

Subru Krishnan commented on YARN-2884:
--

[~vinodkv], thanks for your feedback. Let me first reiterate what I said to 
[~jlowe]'s similar observation, I agree not only that we should move towards a 
better scheduler discovery model but completely decouple apps from platform 
configs. The reason we didn't go down the path you have suggested is it puts a 
dependency on updating all the AMs (which we don't own unlike Timeline service) 
to use the new discovery mechanism. The current approach though non-ideal is 
agnostic to AM. To force the AMs to do just that, we should prevent access to 
the NM's config. If all of you are OK with the consequence, I can go ahead and 
make the change.

I think it'll be better if we open a separate JIRA to address the decoupling of 
app & platform config with an initial sub-task to handle scheduler discovery 
through environment as you suggested? In that case, we'll update the patch to 
remove the changes in ContainerLaunch that overrides the HADOOP_CONF_DIR and 
AFAIK, [~jianhe] is OK with rest of the patch which he can commit asap. This 
will unblock us to use AMRMProxy with at least self contained apps like 
MapReduce, Spark which is our major workload.
 

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
> YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, 
> YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711804#comment-14711804
 ] 

Hadoop QA commented on YARN-4081:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 17s | Findbugs (version ) appears to 
be broken on YARN-3926. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 56s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 34s | The applied patch generated  
89 new checkstyle issues (total was 10, now 99). |
| {color:red}-1{color} | whitespace |   0m 20s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   4m 32s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  55m  6s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 101m 44s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-api |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752272/YARN-4081-YARN-3926.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-3926 / c95993c |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8907/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8907/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8907/console |


This message was automatically generated.

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711863#comment-14711863
 ] 

Subru Krishnan commented on YARN-2884:
--

Just to add more context based on the offline discussions with @jian he, we can 
add a YARN_SCHEDULER_ADDRESS environment based scheduler discovery in the 
*AMRMClient* as an immediate first step. This will not cover all the AMs as 
AMRMClient is not used by custom AMs. Moreover apps can bring their own client 
JAR and the version can be older as long as it's backward compatible.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
> YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, 
> YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Improve RM node labels web UI

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711941#comment-14711941
 ] 

Hadoop QA commented on YARN-3717:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 50s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 59s | Site still builds. |
| {color:red}-1{color} | checkstyle |   1m 57s | The applied patch generated  3 
new checkstyle issues (total was 16, now 18). |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   7m 20s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 59s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 56s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 11s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  53m 36s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 121m 41s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752284/YARN-3717.20150825-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / eee0d45 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8909/console |


This message was automatically generated.

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label

[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-25 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711987#comment-14711987
 ] 

Jian He commented on YARN-2884:
---

To make this move faster, I think we can have a separate jira to address the 
scheduler address discovery problem. At least, MR job can run without the 
change.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
> YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, 
> YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-25 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4082:


 Summary: Container shouldn't be killed when node's label updated.
 Key: YARN-4082
 URL: https://issues.apache.org/jira/browse/YARN-4082
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


>From YARN-2920, containers will be killed if partition of a node changed. 
>Instead of killing containers, we should update resource-usage-by-partition 
>properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4082:
-
Component/s: (was: api)
 (was: client)
 (was: resourcemanager)
 capacityscheduler

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4082:
-
Component/s: (was: capacityscheduler)
 capacity scheduler

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4082:
-
Attachment: YARN-4082.1.patch

Uploaded initial patch.

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4082.1.patch
>
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4083) Add a discovery mechanism for the scheduler addresss

2015-08-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan reassigned YARN-4083:


Assignee: Subru Krishnan  (was: Kishore Chaliparambil)

> Add a discovery mechanism for the scheduler addresss
> 
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4083) Add a discovery mechanism for the scheduler addresss

2015-08-25 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-4083:


 Summary: Add a discovery mechanism for the scheduler addresss
 Key: YARN-4083
 URL: https://issues.apache.org/jira/browse/YARN-4083
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Subru Krishnan
Assignee: Kishore Chaliparambil


We introduce the notion of an RMProxy, running on each node (or once per rack). 
Upon start the AM is forced (via tokens and configuration) to direct all its 
requests to a new services running on the NM that provide a proxy to the 
central RM. 

This give us a place to:
1) perform distributed scheduling decisions
2) throttling mis-behaving AMs
3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4083) Add a discovery mechanism for the scheduler addresss

2015-08-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4083:
-
Description: (was: We introduce the notion of an RMProxy, running on 
each node (or once per rack). Upon start the AM is forced (via tokens and 
configuration) to direct all its requests to a new services running on the NM 
that provide a proxy to the central RM. 

This give us a place to:
1) perform distributed scheduling decisions
2) throttling mis-behaving AMs
3) mask the access to a federation of RMs)

> Add a discovery mechanism for the scheduler addresss
> 
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4083) Add a discovery mechanism for the scheduler addresss

2015-08-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4083:
-
Description: Today many apps like Distributed Shell, REEF, etc rely on the 
fact that the HADOOP_CONF_DIR of the NM is on the classpath to discover the 
scheduler address. This JIRA proposes the addition of an explicit discovery 
mechanism for the scheduler address

> Add a discovery mechanism for the scheduler addresss
> 
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
> HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler 
> address. This JIRA proposes the addition of an explicit discovery mechanism 
> for the scheduler address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4083) Add a discovery mechanism for the scheduler addresss

2015-08-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4083:
-
Issue Type: Improvement  (was: Sub-task)
Parent: (was: YARN-2877)

> Add a discovery mechanism for the scheduler addresss
> 
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
> HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler 
> address. This JIRA proposes the addition of an explicit discovery mechanism 
> for the scheduler address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4083) Add a discovery mechanism for the scheduler addresss

2015-08-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712206#comment-14712206
 ] 

Subru Krishnan commented on YARN-4083:
--

Based on the [discussion | 
https://issues.apache.org/jira/browse/YARN-2884?focusedCommentId=14711711] with 
[~jianhe], [~vinodkv], [~kishorch] and [~jlowe] in YARN-2884, will implement an 
initial scheduler address discovery mechanism based on an environment say 
YARN_SCHEDULER_ADDRESS.

> Add a discovery mechanism for the scheduler addresss
> 
>
> Key: YARN-4083
> URL: https://issues.apache.org/jira/browse/YARN-4083
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>
> Today many apps like Distributed Shell, REEF, etc rely on the fact that the 
> HADOOP_CONF_DIR of the NM is on the classpath to discover the scheduler 
> address. This JIRA proposes the addition of an explicit discovery mechanism 
> for the scheduler address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712210#comment-14712210
 ] 

Varun Saxena commented on YARN-3816:


[~djp], thanks for the patch. Few comments and questions.

# This pertains to what we are doing in YARN-4053. I see that we will be using 
column qualifier postfix/suffix to identify if metric is an aggregated one or 
not. In your case, this would mean an OR filter of the form metric=0 OR 
metric0=1 while applying metric filters on reader side. We were thinking of 
using similar scheme to identify a metric as long or double.  If we use same 
scheme for long or double, we may end up with 4 ORs' for a single metric. Maybe 
we can use cell tags for aggregation. Or not support mixed data types. cc 
[~jrottinghuis].
# IIUC, TimelineMetric#toAggrgate flag would indicate if a metric is to be 
aggregated or not. Maybe in TimelineCollector#aggregateMetrics, we should do 
aggregation only if the flag is enabled.
# In TimelineCollector#appendAggregatedMetricsToEntities any reason we are 
creating separate TimelineEntity objects for each metric ?  Maybe create a 
single entity containing a set of metrics.
# 3 new maps have been introduced in TimelineCollector and these are used as 
base to calculate aggregated value. What if the daemon crashes ? 
# In TimelineMetricCalculator some functions have duplicate if conditions for 
long.
# In TimelineMetricCalculator#sum, to avoid negative values due to overflow, we 
can change conditions like below 
{code}
if (n1 instanceof Integer){
  return new Integer(n1.intValue() + n2.intValue());
}
{code}
to something like ?
{code}
if (n1 instanceof Integer){
  if (Integer.MAX_VALUE - n1 - n2 < 0) {
return new Long(n1.longValue() + n2.longValue());
 } else {
   return new Integer(n1.intValue() + n2.intValue());
}
  }
{code}
We need not support upto BigInteger or BigDecimal but as you said above, we can 
throw exception for unsupported types. 
# In TimelineMetric#aggregateTo, maybe use getValues instead of getValuesJAXB ?
# Also I was wondering if TimelineMetric#aggregateTo should be moved to some 
util class. TimelineMetric is part of object model and exposed to client. And 
IIUC aggregateTo wont be called by client.
# What is EntityColumnPrefix#AGGREGATED_METRICS meant for ?

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712209#comment-14712209
 ] 

Subru Krishnan commented on YARN-2884:
--

Thanks [~jianhe], have created YARN-4083.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
> YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, 
> YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712220#comment-14712220
 ] 

Varun Saxena commented on YARN-3816:


BTW, while TimelineMetric#toAggregate flag is meant to indicate if a metric 
needs to be aggregated. But are we planning to use it to  indicate that a 
metric is an aggregated metric as well ? If yes, we should probably set this 
flag for each metric processed in 
TimelineCollector#appendAggregatedMetricsToEntities. 

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4082) Container shouldn't be killed when node's label updated.

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712224#comment-14712224
 ] 

Hadoop QA commented on YARN-4082:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 50s | The applied patch generated  9 
new checkstyle issues (total was 299, now 308). |
| {color:red}-1{color} | whitespace |   0m  5s | The patch has 23  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 30s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  53m 50s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  93m 46s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752332/YARN-4082.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8911/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8911/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8911/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8911/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8911/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8911/console |


This message was automatically generated.

> Container shouldn't be killed when node's label updated.
> 
>
> Key: YARN-4082
> URL: https://issues.apache.org/jira/browse/YARN-4082
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4082.1.patch
>
>
> From YARN-2920, containers will be killed if partition of a node changed. 
> Instead of killing containers, we should update resource-usage-by-partition 
> properly when node's partition updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712230#comment-14712230
 ] 

Li Lu commented on YARN-3816:
-

bq. Also I was wondering if TimelineMetric#aggregateTo should be moved to some 
util class. TimelineMetric is part of object model and exposed to client. And 
IIUC aggregateTo wont be called by client.
Sorry but I think putting the aggregateTo method here is fine. I don't really 
like the idea of putting these static methods to a util class just because they 
look like utils. This is more of a subjective topic, but I hope our util 
methods to be general enough for the entire module. Aggregating metrics is not 
like reverting integers in their binary implementations, which is general 
enough for the whole module as a general "util". Here, aggregate metrics is 
clean enough to be a general operation to timeline metrics. I didn't get the 
part of the "called by client" discussion: our object model is used by both 
ourselves and our clients, so why "not called by clients" is a problem of our 
object model (the offline aggregation, for example, will also use this 
aggregation method)? 

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712261#comment-14712261
 ] 

Varun Saxena commented on YARN-3816:


Hmm...Not being called by client is not a problem. I did not mean that. I was 
primarily thinking of these classes as data classes with getters and setters, 
and functional logic detatched from them. And this method is not using any 
member variables either.
But yes this method wont be generic enough at a global level. There is point to 
that as well.

Currently this method aggregateTo is not static. I think it should be made 
static even if its kept inside TimelineMetric as its not using any member 
variables. 



> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712269#comment-14712269
 ] 

Robert Kanter commented on YARN-3528:
-

[~brahma], have you had a chance to look at the testcase failures?

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712271#comment-14712271
 ] 

Robert Kanter commented on YARN-3528:
-

Oops, wrong person.  [~brahmareddy]

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712308#comment-14712308
 ] 

Rohith Sharma K S commented on YARN-3893:
-

Hi [~varun_saxena], trying to understand your point of statement, my suggestion 
is to exit the RM if any configuration issue during refreshAll during 
{{AdminService#transitionToActive}}. As I given reason for making RM JVM down 
rather than keeping JVM alive in earlier 
[comment|https://issues.apache.org/jira/browse/YARN-3893?focusedCommentId=14711201&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14711201],
 do you have any concern for exiting the RM for configuration issues?

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2884) Proxying all AM-RM communications

2015-08-25 Thread Kishore Chaliparambil (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishore Chaliparambil updated YARN-2884:

Attachment: YARN-2884-V10.patch

Uploaded YARN-2884-V10.patch. The changes in ContainerLaunch has been removed.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, 
> YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, 
> YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4065) container-executor error should include effective user id

2015-08-25 Thread Casey Brotherton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712391#comment-14712391
 ] 

Casey Brotherton commented on YARN-4065:


Absolutely.  Will start working on it

> container-executor error should include effective user id
> -
>
> Key: YARN-4065
> URL: https://issues.apache.org/jira/browse/YARN-4065
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Casey Brotherton
>Priority: Trivial
>
> When container-executor fails to access it's config file, the following 
> message will be thrown:
> {code}
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container executor initialization is : 24
> ExitCodeException exitCode=24: Invalid conf file provided : 
> /etc/hadoop/conf/container-executor.cfg
> {code}
> The real problem may be a change in the container-executor not running as set 
> uid root.
> From:
> https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html
> {quote}
> The container-executor program must be owned by root and have the permission 
> set ---sr-s---.
> {quote}
> The error message could be improved by printing out the effective user id 
> with the error message, and possibly the executable trying to access the 
> config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3854) Add localization support for docker images

2015-08-25 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712426#comment-14712426
 ] 

Jun Gong commented on YARN-3854:


I think what we need is a private registry. Push local image to private 
registry when(or before) submitting app, then NM could pull it form the private 
registry.

BTW: We could build the private registry using HDFS as the storage 
backend(https://github.com/hex108/docker-registry-driver-hdfs), and it works 
well in our cluster.

> Add localization support for docker images
> --
>
> Key: YARN-3854
> URL: https://issues.apache.org/jira/browse/YARN-3854
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>
> We need the ability to localize images from HDFS and load them for use when 
> launching docker containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712446#comment-14712446
 ] 

Hadoop QA commented on YARN-2884:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  22m  8s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 6 new or modified test files. |
| {color:green}+1{color} | javac |   8m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m  1s | The applied patch generated  1 
new checkstyle issues (total was 237, now 237). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 51s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   7m 32s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  56m 17s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 124m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752377/YARN-2884-V10.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a4d9acc |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8912/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8912/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8912/console |


This message was automatically generated.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, 
> YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, 
> YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712477#comment-14712477
 ] 

Sunil G commented on YARN-3250:
---

Latest patch looks good to me. Could you please check the test failures whether 
it's related or not. 
Thank you.. 

> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-25 Thread Ved Prakash Pandey (JIRA)
Ved Prakash Pandey created YARN-4084:


 Summary: Yarn should allow to skip hadoop-yarn-server-tests 
project from build..
 Key: YARN-4084
 URL: https://issues.apache.org/jira/browse/YARN-4084
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Affects Versions: 2.7.1
Reporter: Ved Prakash Pandey


For fast compilation one can try to skip the test code compilation by using 
{{-Dmaven.test.skip=true}}. But when yarn-project fails to compile when this 
option is used. This is because, it depends on hadoop-yarn-server-tests 
project. 
Below is the exception :
{noformat}
[ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
attachment with classifier: tests in module project: 
org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
module from the module-set.
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-25 Thread Ved Prakash Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ved Prakash Pandey updated YARN-4084:
-
Priority: Minor  (was: Major)

> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But when yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-25 Thread Ved Prakash Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ved Prakash Pandey updated YARN-4084:
-
Description: 
For fast compilation one can try to skip the test code compilation by using 
{{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this option 
is used. This is because, it depends on hadoop-yarn-server-tests project. 
Below is the exception :
{noformat}
[ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
attachment with classifier: tests in module project: 
org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
module from the module-set.
{noformat}



  was:
For fast compilation one can try to skip the test code compilation by using 
{{-Dmaven.test.skip=true}}. But when yarn-project fails to compile when this 
option is used. This is because, it depends on hadoop-yarn-server-tests 
project. 
Below is the exception :
{noformat}
[ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
attachment with classifier: tests in module project: 
org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
module from the module-set.
{noformat}




> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-25 Thread Ved Prakash Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ved Prakash Pandey updated YARN-4084:
-
Attachment: YARN-4084.patch

> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
> Attachments: YARN-4084.patch
>
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4084) Yarn should allow to skip hadoop-yarn-server-tests project from build..

2015-08-25 Thread Ved Prakash Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712560#comment-14712560
 ] 

Ved Prakash Pandey commented on YARN-4084:
--

To fix this I have created a new profile called 
{{enable-yarn-server-test-module}} in hadoop-yarn-server pom.

To include this module for one has to pass the 
{{-Penable-yarn-server-test-module}} during compilation

> Yarn should allow to skip hadoop-yarn-server-tests project from build..
> ---
>
> Key: YARN-4084
> URL: https://issues.apache.org/jira/browse/YARN-4084
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.7.1
>Reporter: Ved Prakash Pandey
>Priority: Minor
> Attachments: YARN-4084.patch
>
>
> For fast compilation one can try to skip the test code compilation by using 
> {{-Dmaven.test.skip=true}}. But yarn-project fails to compile when this 
> option is used. This is because, it depends on hadoop-yarn-server-tests 
> project. 
> Below is the exception :
> {noformat}
> [ERROR] Assembly: hadoop-yarn-dist is not configured correctly: Cannot find 
> attachment with classifier: tests in module project: 
> org.apache.hadoop:hadoop-yarn-server-tests:jar:2.7.0. Please exclude this 
> module from the module-set.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2884) Proxying all AM-RM communications

2015-08-25 Thread Kishore Chaliparambil (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishore Chaliparambil updated YARN-2884:

Attachment: YARN-2884-V11.patch

Removed the ApplicationConstants.java file from the patch because it is not 
required.

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V10.patch, 
> YARN-2884-V11.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, 
> YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, 
> YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)