[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-11-19 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219089#comment-14219089
 ] 

Devaraj K commented on YARN-2356:
-

Thanks Sunil for the updated patch.
1.
bq. I could see that this return of exit code from each of the printXXXReport() 
was causing nested if in the caller side, and was becoming less readable. Also 
killApplication is already rethrowing exception and handling similar way.. 
Kindly share your thoughts on this.
Either way is fine for me. I thought it would be good if we avoid rehandling 
the same exception. 

2. For the tests,

Can you remove the try-catch completely like below. If the test code throws any 
exception, it means it is failure and no need to fail again explicitly.

{code:xml}
@Test
  public void testGetContainerReportException() throws Exception {
ApplicationCLI cli = createAndGetAppCLI();
ApplicationId applicationId = ApplicationId.newInstance(1234, 5);
ApplicationAttemptId attemptId = ApplicationAttemptId.newInstance(
applicationId, 1);
long cntId = 1;
ContainerId containerId1 = ContainerId.newContainerId(attemptId, cntId++);
when(client.getContainerReport(containerId1)).thenThrow(
new ApplicationNotFoundException("History file for application"
+ applicationId + " is not found"));
int exitCode = cli.run(new String[] { "container", "-status",
containerId1.toString() });
verify(sysOut).println(
"Application for Container with id '" + containerId1
+ "' doesn't exist in RM or Timeline Server.");
Assert.assertNotSame("should return non-zero exit code.", 0, exitCode);

ContainerId containerId2 = ContainerId.newContainerId(attemptId, cntId++);
{code}

3. The patch is not getting applied using 'patch' command, can you check it for 
next patch?


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocol

[jira] [Commented] (YARN-2679) add container launch prepare time metrics to NM.

2014-11-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219090#comment-14219090
 ] 

zhihai xu commented on YARN-2679:
-

Hi [~kasha],
Good suggestion, I uploaded a new patch YARN-2679.001.patch to address your 
comment.
thanks
zhihai

> add container launch prepare time metrics to NM.
> 
>
> Key: YARN-2679
> URL: https://issues.apache.org/jira/browse/YARN-2679
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2679.000.patch, YARN-2679.001.patch
>
>
> add metrics in NodeManagerMetrics to get prepare time to launch container.
> The prepare time is the duration between sending 
> ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
> ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2679) add container launch prepare time metrics to NM.

2014-11-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2679:

Attachment: YARN-2679.001.patch

> add container launch prepare time metrics to NM.
> 
>
> Key: YARN-2679
> URL: https://issues.apache.org/jira/browse/YARN-2679
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2679.000.patch, YARN-2679.001.patch
>
>
> add metrics in NodeManagerMetrics to get prepare time to launch container.
> The prepare time is the duration between sending 
> ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
> ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-11-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219019#comment-14219019
 ] 

Sunil G commented on YARN-1963:
---

Thank you [~leftnoteasy]

bq. I'd prefer highest + default priority.
This configuration will make it easier for admins to config the same. Still I 
am not convinced with default acceptance coming from lower priorities by 
default. But I am not seeing any use case where this lower priorities are a 
problem also. Yes, we can have this as highest + default (this one i already 
have). Instead of labels per queue, it will be changed as highest per queue. I 
will update doc as per same, also my patch.

bq. extra complexity both in implementation and configuration
I agree about the more complicated config and implementation for this part. As 
you mentioned, if a preemption feature related to  YARN-2069 runs in parallel, 
then the issue which I pointed out can be solved. So user-limit factor 
preemption if considers priority also, we can get the head room which is 
needed. User has to enable this preemption though. If this is workaround way is 
fine for resolving the issue mentioned, then I will file a jira to relate 
priority with user-limit preemption. Kindly share your thoughts.

bq. I didn't see any related code in YarnClient
Yes, this code is now in YarnRunner which is part of map reduce. I wanted to 
see it with YarnClient.

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
> Attachments: YARN Application Priorities Design.pdf
>
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"

2014-11-19 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219009#comment-14219009
 ] 

Rohith commented on YARN-2865:
--

Thanks Karthik JianHe and Tsuyoshi for your reviews.

> Application recovery continuously fails with "Application with id already 
> present. Cannot duplicate"
> 
>
> Key: YARN-2865
> URL: https://issues.apache.org/jira/browse/YARN-2865
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch
>
>
> YARN-2588 handles exception thrown while transitioningToActive and reset 
> activeServices. But it misses out clearing RMcontext apps/nodes details and 
> ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2315) FairScheduler: Set current capacity in addition to capacity

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218986#comment-14218986
 ] 

Hudson commented on YARN-2315:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6579 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6579/])
YARN-2315. FairScheduler: Set current capacity in addition to capacity. (Zhihai 
Xu via kasha) (kasha: rev a9a0cc3679432774154a07d3157ffa0a43e0bf01)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> FairScheduler: Set current capacity in addition to capacity
> ---
>
> Key: YARN-2315
> URL: https://issues.apache.org/jira/browse/YARN-2315
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
> YARN-2315.003.patch, YARN-2315.patch
>
>
> Should use setCurrentCapacity instead of setCapacity to configure used 
> resource capacity for FairScheduler.
> In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
> different parameters so the first call is overrode by the second call. 
> queueInfo.setCapacity((float) getFairShare().getMemory() /
> scheduler.getClusterResource().getMemory());
> queueInfo.setCapacity((float) getResourceUsage().getMemory() /
> scheduler.getClusterResource().getMemory());
> We should change the second setCapacity call to setCurrentCapacity to 
> configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2315) FairScheduler: Set current capacity in addition to capacity

2014-11-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2315:
---
Summary: FairScheduler: Set current capacity in addition to capacity  (was: 
FairScheduler: Should use setCurrentCapacity instead of setCapacity to 
configure used resource capacity for FairScheduler.)

> FairScheduler: Set current capacity in addition to capacity
> ---
>
> Key: YARN-2315
> URL: https://issues.apache.org/jira/browse/YARN-2315
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
> YARN-2315.003.patch, YARN-2315.patch
>
>
> Should use setCurrentCapacity instead of setCapacity to configure used 
> resource capacity for FairScheduler.
> In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
> different parameters so the first call is overrode by the second call. 
> queueInfo.setCapacity((float) getFairShare().getMemory() /
> scheduler.getClusterResource().getMemory());
> queueInfo.setCapacity((float) getResourceUsage().getMemory() /
> scheduler.getClusterResource().getMemory());
> We should change the second setCapacity call to setCurrentCapacity to 
> configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2315) FairScheduler: Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.

2014-11-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2315:
---
Summary: FairScheduler: Should use setCurrentCapacity instead of 
setCapacity to configure used resource capacity for FairScheduler.  (was: 
Should use setCurrentCapacity instead of setCapacity to configure used resource 
capacity for FairScheduler.)

> FairScheduler: Should use setCurrentCapacity instead of setCapacity to 
> configure used resource capacity for FairScheduler.
> --
>
> Key: YARN-2315
> URL: https://issues.apache.org/jira/browse/YARN-2315
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
> YARN-2315.003.patch, YARN-2315.patch
>
>
> Should use setCurrentCapacity instead of setCapacity to configure used 
> resource capacity for FairScheduler.
> In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
> different parameters so the first call is overrode by the second call. 
> queueInfo.setCapacity((float) getFairShare().getMemory() /
> scheduler.getClusterResource().getMemory());
> queueInfo.setCapacity((float) getResourceUsage().getMemory() /
> scheduler.getClusterResource().getMemory());
> We should change the second setCapacity call to setCurrentCapacity to 
> configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2315) Should use setCurrentCapacity instead of setCapacity to configure used resource capacity for FairScheduler.

2014-11-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218977#comment-14218977
 ] 

Karthik Kambatla commented on YARN-2315:


+1. 

> Should use setCurrentCapacity instead of setCapacity to configure used 
> resource capacity for FairScheduler.
> ---
>
> Key: YARN-2315
> URL: https://issues.apache.org/jira/browse/YARN-2315
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2315.001.patch, YARN-2315.002.patch, 
> YARN-2315.003.patch, YARN-2315.patch
>
>
> Should use setCurrentCapacity instead of setCapacity to configure used 
> resource capacity for FairScheduler.
> In function getQueueInfo of FSQueue.java, we call setCapacity twice with 
> different parameters so the first call is overrode by the second call. 
> queueInfo.setCapacity((float) getFairShare().getMemory() /
> scheduler.getClusterResource().getMemory());
> queueInfo.setCapacity((float) getResourceUsage().getMemory() /
> scheduler.getClusterResource().getMemory());
> We should change the second setCapacity call to setCurrentCapacity to 
> configure the current used capacity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2679) add container launch prepare time metrics to NM.

2014-11-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218970#comment-14218970
 ] 

Karthik Kambatla commented on YARN-2679:


How about renaming the metric to {{containerLaunchDuration}} and update the 
method names accordingly? 

> add container launch prepare time metrics to NM.
> 
>
> Key: YARN-2679
> URL: https://issues.apache.org/jira/browse/YARN-2679
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2679.000.patch
>
>
> add metrics in NodeManagerMetrics to get prepare time to launch container.
> The prepare time is the duration between sending 
> ContainersLauncherEventType.LAUNCH_CONTAINER event and receiving  
> ContainerEventType.CONTAINER_LAUNCHED event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) ClusterMetrics to include AM launch and register delays

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218969#comment-14218969
 ] 

Hudson commented on YARN-2802:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6578 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6578/])
YARN-2802. ClusterMetrics to include AM launch and register delays. (Zhihai Xu 
via kasha) (kasha: rev c90fb84aaa902e6676de65d0016dee3a5414eb95)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClusterMetrics.java


> ClusterMetrics to include AM launch and register delays
> ---
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.0
>
> Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
> YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, 
> YARN-2802.005.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-11-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218965#comment-14218965
 ] 

Karthik Kambatla commented on YARN-2675:


Can we add unit tests to exercise all the newly added transitions? Otherwise, 
the patch looks good. [~vinodkv] - do the changes look okay to you as well? 

> the containersKilled metrics is not updated when the container is killed 
> during localization.
> -
>
> Key: YARN-2675
> URL: https://issues.apache.org/jira/browse/YARN-2675
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
> YARN-2675.002.patch, YARN-2675.003.patch
>
>
> The containersKilled metrics is not updated when the container is killed 
> during localization. We should add KILLING state in finished of 
> ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218961#comment-14218961
 ] 

Hudson commented on YARN-2865:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6577 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6577/])
YARN-2865. Fixed RM to always create a new RMContext when transtions from 
StandBy to Active. Contributed by Rohith Sharmaks (jianhe: rev 
9cb8b75ba57f18639492bfa3b7e7c11c00bb3d3b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt


> Application recovery continuously fails with "Application with id already 
> present. Cannot duplicate"
> 
>
> Key: YARN-2865
> URL: https://issues.apache.org/jira/browse/YARN-2865
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch
>
>
> YARN-2588 handles exception thrown while transitioningToActive and reset 
> activeServices. But it misses out clearing RMcontext apps/nodes details and 
> ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) ClusterMetrics to include AM launch and register delays

2014-11-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2802:
---
Summary: ClusterMetrics to include AM launch and register delays  (was: add 
AM container launch and register delay metrics in QueueMetrics to help diagnose 
performance issue.)

> ClusterMetrics to include AM launch and register delays
> ---
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
> YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, 
> YARN-2802.005.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218956#comment-14218956
 ] 

Karthik Kambatla commented on YARN-2802:


This should be very useful. The patch looks good. +1

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
> YARN-2802.002.patch, YARN-2802.003.patch, YARN-2802.004.patch, 
> YARN-2802.005.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2599) Standby RM should also expose some jmx and metrics

2014-11-19 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-2599:


Assignee: Rohith

> Standby RM should also expose some jmx and metrics
> --
>
> Key: YARN-2599
> URL: https://issues.apache.org/jira/browse/YARN-2599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Rohith
>
> YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
> need to separate out metrics displayed so the Standby RM can also be 
> monitored. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2014-11-19 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-2880:


Assignee: Rohith

> Add a test in TestRMRestart to make sure node labels will be recovered if it 
> is enabled
> ---
>
> Key: YARN-2880
> URL: https://issues.apache.org/jira/browse/YARN-2880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Rohith
>
> As suggested by [~ozawa], 
> [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
>  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project

2014-11-19 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2784:
-
Fix Version/s: (was: 2.6.0)
   2.7.0

> Yarn project module names in POM needs to consistent acros hadoop project
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-11-19 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218883#comment-14218883
 ] 

Subru Krishnan commented on YARN-2738:
--

Thanks [~adhoot] for the updated patch. It looks mostly good. I don't see any 
way to override the defaults (_Agent_, _Admission Policy_, _Replanner_, etc) at 
system level, is this intentional? I understand that you do not want to allow 
configuring them per queue in the first iteration but right now there is no 
option to even override them even globally  as defaults are hard coded.

Are you planning to file a separate JIRA for the _Plan Follower_ work as that 
seems to be the last piece for enabling reservations in _FairScheduler_ :).

> Add FairReservationSystem for FairScheduler
> ---
>
> Key: YARN-2738
> URL: https://issues.apache.org/jira/browse/YARN-2738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
> YARN-2738.003.patch
>
>
> Need to create a FairReservationSystem that will implement ReservationSystem 
> for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-19 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218855#comment-14218855
 ] 

Karthik Kambatla commented on YARN-2604:


Thanks for the updating the patch. I should have made these comments on the 
earlier patch itself: 
# findbugs-exclude: there is a duplicate entry there.
# Looking at the current code, I wonder if we should keep all logic about 
having two different maximumAllocations should be limited to 
AbstractYarnScheduler to avoid mistakes in the future. We can make related 
fields all private and accessible only through getter/setter methods. 
# {{updateMaxAllocation}} could take a {{Resource}} and a boolean to denote 
adding/removing a node, instead of SchedulerNode. That way, we don't have to 
iterate through all the nodes in the removeNode case. 



> Scheduler should consider max-allocation-* in conjunction with the largest 
> node
> ---
>
> Key: YARN-2604
> URL: https://issues.apache.org/jira/browse/YARN-2604
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch, 
> YARN-2604.patch
>
>
> If the scheduler max-allocation-* values are larger than the resources 
> available on the largest node in the cluster, an application requesting 
> resources between the two values will be accepted by the scheduler but the 
> requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-11-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218794#comment-14218794
 ] 

Jian He commented on YARN-2301:
---

bq. he port will be node's http
NM can setup SSL and so the port can also be https port.
bq. Can you please provide more precisely where this check is done internally?
I meant {{Times.format}} is internally doing the check.
bq.  pass the existing config object
this will cause a series method signature changes. we may set the conf object 
in the rmContext and get it from context 

> Improve yarn container command
> --
>
> Key: YARN-2301
> URL: https://issues.apache.org/jira/browse/YARN-2301
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Naganarasimha G R
>  Labels: usability
> Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2303.patch
>
>
> While running yarn container -list  command, some 
> observations:
> 1) the scheme (e.g. http/https  ) before LOG-URL is missing
> 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
> print as time format.
> 3) finish-time is 0 if container is not yet finished. May be "N/A"
> 4) May have an option to run as yarn container -list  OR  yarn 
> application -list-containers  also.  
> As attempt Id is not shown on console, this is easier for user to just copy 
> the appId and run it, may  also be useful for container-preserving AM 
> restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-11-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218766#comment-14218766
 ] 

Jian He commented on YARN-2765:
---

thanks Jason for working on this.  
It's useful to have leveldb as an option for RMStateStore, as its more 
lightweight compared to others. reviewing the patch 

> Add leveldb-based implementation for RMStateStore
> -
>
> Key: YARN-2765
> URL: https://issues.apache.org/jira/browse/YARN-2765
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2765.patch, YARN-2765v2.patch
>
>
> It would be nice to have a leveldb option to the resourcemanager recovery 
> store. Leveldb would provide some benefits over the existing filesystem store 
> such as better support for atomic operations, fewer I/O ops per state update, 
> and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"

2014-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218715#comment-14218715
 ] 

Hadoop QA commented on YARN-2865:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682428/YARN-2865.1.patch
  against trunk revision 73348a4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5883//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5883//console

This message is automatically generated.

> Application recovery continuously fails with "Application with id already 
> present. Cannot duplicate"
> 
>
> Key: YARN-2865
> URL: https://issues.apache.org/jira/browse/YARN-2865
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch
>
>
> YARN-2588 handles exception thrown while transitioningToActive and reset 
> activeServices. But it misses out clearing RMcontext apps/nodes details and 
> ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218666#comment-14218666
 ] 

Hadoop QA commented on YARN-2738:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682356/YARN-2738.003.patch
  against trunk revision 5bd048e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5882//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5882//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5882//console

This message is automatically generated.

> Add FairReservationSystem for FairScheduler
> ---
>
> Key: YARN-2738
> URL: https://issues.apache.org/jira/browse/YARN-2738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2738.001.patch, YARN-2738.002.patch, 
> YARN-2738.003.patch
>
>
> Need to create a FairReservationSystem that will implement ReservationSystem 
> for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218651#comment-14218651
 ] 

Hudson commented on YARN-2878:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6575 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6575/])
YARN-2878. Fix DockerContainerExecutor.apt.vm formatting. Contributed by Abin 
Shahab (jianhe: rev bc4ee5e06f89b2037e0967f8ba91089ced4b7f0e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Fix DockerContainerExecutor.apt.vm formatting
> -
>
> Key: YARN-2878
> URL: https://issues.apache.org/jira/browse/YARN-2878
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Fix For: 2.7.0
>
> Attachments: YARN-1964-docs.patch
>
>
> The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-11-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218629#comment-14218629
 ] 

Zhijie Shen commented on YARN-2301:
---

bq. i need to create new YARNConfigurations RMContainerImpl constructor and 
keep it.

We shouldn't construct a yarn config object. Instead, when constructing 
RMContainerImpl, we need to pass the existing config object in as we did for 
RMAppImpl and RMAppAttemptImpl

> Improve yarn container command
> --
>
> Key: YARN-2301
> URL: https://issues.apache.org/jira/browse/YARN-2301
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Naganarasimha G R
>  Labels: usability
> Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2303.patch
>
>
> While running yarn container -list  command, some 
> observations:
> 1) the scheme (e.g. http/https  ) before LOG-URL is missing
> 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
> print as time format.
> 3) finish-time is 0 if container is not yet finished. May be "N/A"
> 4) May have an option to run as yarn container -list  OR  yarn 
> application -list-containers  also.  
> As attempt Id is not shown on console, this is easier for user to just copy 
> the appId and run it, may  also be useful for container-preserving AM 
> restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting

2014-11-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218607#comment-14218607
 ] 

Jian He commented on YARN-2878:
---

+1, committing. 
thanks [~ashahab] for the patch and thanks [~ajisakaa] for the review !

> Fix DockerContainerExecutor.apt.vm formatting
> -
>
> Key: YARN-2878
> URL: https://issues.apache.org/jira/browse/YARN-2878
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-1964-docs.patch
>
>
> The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"

2014-11-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218598#comment-14218598
 ] 

Jian He commented on YARN-2865:
---

lgtm, test failures looks unrelated, re-kick jenkins

> Application recovery continuously fails with "Application with id already 
> present. Cannot duplicate"
> 
>
> Key: YARN-2865
> URL: https://issues.apache.org/jira/browse/YARN-2865
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch
>
>
> YARN-2588 handles exception thrown while transitioningToActive and reset 
> activeServices. But it misses out clearing RMcontext apps/nodes details and 
> ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218595#comment-14218595
 ] 

Zhijie Shen commented on YARN-2375:
---

Looks good to me overall, except some minor issues:

1. can we add a test case in TestMRTimelineEventHandling to check the scenario 
that MAPREDUCE_JOB_EMIT_TIMELINE_DATA = true, TIMELINE_SERVICE_ENABLED = false 
and MiniMRYarnCluster doesn't start the timeline server?

2. In ApplicationMaster.finish(), let's stop the timeline client?

3. Fix the indent issue in TimelineClientImpl#serviceInit().

4. {{LOG.info("Timeline server is (not) enabled");}} -> {{LOG.info("Timeline 
service is (not) enabled");}}? To be consistent with the log sentence in other 
places.


> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
> Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch
>
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-19 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-2877:

Comment: was deleted

(was: Linking to HADOOP-11317 to cover project-wide use.

I don't think yarn-common needs to explicitly declare a dependency on log4j, at 
least outside the test run. If you comment out that dependency —does everything 
still build?)

> Extend YARN to support distributed scheduling
> -
>
> Key: YARN-2877
> URL: https://issues.apache.org/jira/browse/YARN-2877
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-19 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-2877:

Comment: was deleted

(was: (ignore that comment, was for YARN-2875))

> Extend YARN to support distributed scheduling
> -
>
> Key: YARN-2877
> URL: https://issues.apache.org/jira/browse/YARN-2877
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2014-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218526#comment-14218526
 ] 

Hadoop QA commented on YARN-2800:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12682473/YARN-2800-20141119-1.patch
  against trunk revision 5bd048e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5881//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5881//console

This message is automatically generated.

> Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
> feature
> 
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
> YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
> YARN-2800-20141119-1.patch
>
>
> In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
> feature without configuring where to store node labels on file system. It 
> seems convenient for user to try this, but actually it causes some bad use 
> experience. User may add/remove labels, and edit capacity-scheduler.xml. 
> After RM restart, labels will gone, (we store it in mem). And RM cannot get 
> started if we have some queue uses labels, and the labels don't exist in 
> cluster.
> As what we discussed, we should have an explicitly way to let user specify if 
> he/she wants this feature or not. If node label is disabled, any operations 
> trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-19 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218507#comment-14218507
 ] 

Jonathan Eagles commented on YARN-2375:
---

This code looks good to me. [~zjshen], can you give a final review?

> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
> Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch
>
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218439#comment-14218439
 ] 

Hadoop QA commented on YARN-2495:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12682448/YARN-2495.20141119-1.patch
  against trunk revision 5bd048e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5880//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5880//console

This message is automatically generated.

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml or using script 
> suggested by [~aw])
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218405#comment-14218405
 ] 

Zhijie Shen commented on YARN-2879:
---

In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with new shuffle on NM;
3. Submitting via old client.

We will see the following console exception:
{code}
Console Log:
14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed 
successfully
java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
at java.lang.Enum.valueOf(Enum.java:236)
at 
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
at 
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
at 
org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
at 
org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{code}

The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that 
we haven't cover all the problematic code path. Will another Jira again.

> Compatibility validation between YARN 2.2/2.4 and 2.6
> -
>
> Key: YARN-2879
> URL: https://issues.apache.org/jira/browse/YARN-2879
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Recently, I did some simple backward compatibility experiments. Bascially, 
> I've taken the following 2 steps:
> 1. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *new* Hadoop (2.6) client.
> 2. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *old* Hadoop (2.2/2.4) client that comes with the app.
> I've tried these 2 steps on both insecure and secure cluster. Here's a short 
> summary:
> || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle 
> and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 ||
> | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible 
> | OK | OK |
> | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
> | OK |
> Note that I've tried to run NM with both old and new version of shuffle 
> handler plus the runtime libs.
> In general, the compatibility looks good overall. There're a f

[jira] [Updated] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2879:
--
Description: 
Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle 
and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new version of shuffle handler 
plus the runtime libs.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.



  was:
Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new version of shuffle handler 
plus the runtime libs.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.




> Compatibility validation between YARN 2.2/2.4 and 2.6
> -
>
> Key: YARN-2879
> URL: https://issues.apache.org/jira/browse/YARN-2879
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Recently, I did some simple backward compatibility experiments. Bascially, 
> I've taken the following 2 steps:
> 1. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *new* Hadoop (2.6) client.
> 2. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *old* Hadoop (2.2/2.4) client that comes with the app.
> I've tried these 2 steps on both insecure and secure cluster. Here's a short 
> summary:
> || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle and RT 2.2 || MR 2.2 + Shuffle 
> and RT 2.6 || MR 2.4 + Shuffle and RT 2.4 || MR 2.4 + Shuffle and RT 2.6 ||
> | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible 
> | OK | OK |
> | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
> | OK |
> Note that I've tried to run NM with both old and new version of shuffle 
> handler plus the runtime libs.
> In general, the compatibility looks good overall. There're a few issues that 
> are related to MR, but they seem to be not the YARN issue. I'll post the 
> individual problem in the follow-up comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2879:
--
Description: 
Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new version of shuffle handler 
plus the runtime libs.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.



  was:
Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new shuffle handler version.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.




> Compatibility validation between YARN 2.2/2.4 and 2.6
> -
>
> Key: YARN-2879
> URL: https://issues.apache.org/jira/browse/YARN-2879
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Recently, I did some simple backward compatibility experiments. Bascially, 
> I've taken the following 2 steps:
> 1. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *new* Hadoop (2.6) client.
> 2. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *old* Hadoop (2.2/2.4) client that comes with the app.
> I've tried these 2 steps on both insecure and secure cluster. Here's a short 
> summary:
> || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || 
> MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
> | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible 
> | OK | OK |
> | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
> OK | OK |
> | Secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
> | OK |
> Note that I've tried to run NM with both old and new version of shuffle 
> handler plus the runtime libs.
> In general, the compatibility looks good overall. There're a few issues that 
> are related to MR, but they seem to be not the YARN issue. I'll post the 
> individual problem in the follow-up comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2014-11-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218400#comment-14218400
 ] 

Wangda Tan commented on YARN-2800:
--

Hi [~ozawa], 
Thanks for your comments,
bq. MemoryRMNodeLabelsManager for tests do nothing in new patch. How about 
renaming MemoryRMNodeLabelsManager to NullRMNodeLabelsManager for the 
consistency with RMStateStore?
Address 

bq. Maybe not related to this JIRA, but it's better to add testing RMRestart 
with NodeLabelManager to avoid regressions.
Agree, filed YARN-2880 to track this,

Uploaded a new patch,

Thanks,
Wangda

> Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
> feature
> 
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
> YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
> YARN-2800-20141119-1.patch
>
>
> In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
> feature without configuring where to store node labels on file system. It 
> seems convenient for user to try this, but actually it causes some bad use 
> experience. User may add/remove labels, and edit capacity-scheduler.xml. 
> After RM restart, labels will gone, (we store it in mem). And RM cannot get 
> started if we have some queue uses labels, and the labels don't exist in 
> cluster.
> As what we discussed, we should have an explicitly way to let user specify if 
> he/she wants this feature or not. If node label is disabled, any operations 
> trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

2014-11-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2800:
-
Attachment: YARN-2800-20141119-1.patch

> Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
> feature
> 
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
> YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
> YARN-2800-20141119-1.patch
>
>
> In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
> feature without configuring where to store node labels on file system. It 
> seems convenient for user to try this, but actually it causes some bad use 
> experience. User may add/remove labels, and edit capacity-scheduler.xml. 
> After RM restart, labels will gone, (we store it in mem). And RM cannot get 
> started if we have some queue uses labels, and the labels don't exist in 
> cluster.
> As what we discussed, we should have an explicitly way to let user specify if 
> he/she wants this feature or not. If node label is disabled, any operations 
> trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218388#comment-14218388
 ] 

Zhijie Shen edited comment on YARN-2879 at 11/19/14 7:50 PM:
-

a. In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with either old or new shuffle handler on NM;
3. Submitting via new client.

We will see the following console exception:
{code}
14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String;
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)  
{code}

b. In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with old shuffle on NM;
3. Submitting via old client.

We will see the following exception in the AM Log:
{code}
2014-11-17 15:09:06,157 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1416264695865_0007_01
2014-11-17 15:09:06,436 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.setPolicy(Lorg/apache/hadoop/http/HttpConfig$Policy;)V
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1364)
2014-11-17 15:09:06,439 INFO [Thread-1] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. 
Signaling RMCommunicator and JobHistoryEventHandler.
{code}

The two exceptions are actually the same problem, but using the old client 
prevents it happening during app submission. Will file a separate Jira for it.


was (Author: zjshen):
a. In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with either old or new shuffle handler on NM;
3. Submitting via new client.

We will see the following console exception:
{code}
14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String;
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.

[jira] [Created] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2014-11-19 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2880:


 Summary: Add a test in TestRMRestart to make sure node labels will 
be recovered if it is enabled
 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan


As suggested by [~ozawa], 
[link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
 We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218388#comment-14218388
 ] 

Zhijie Shen commented on YARN-2879:
---

a. In the following scenarios:

1. Either insecure or secure;
2. MR 2.2 with either old or new shuffle handler on NM;
3. Submitting via new client.

We will see the following console exception:
{code}
14/11/17 23:47:45 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/zjshen/.staging/zjshen/.staging/job_1416270549965_0014
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.getSchemePrefix()Ljava/lang/String;
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:302)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)  
{code}

b. In the following scenarios:
1. Either insecure or secure;
2. MR 2.2 with old on NM;
3. Submitting via old client.

We will see the following exception in the AM Log:
{code}
2014-11-17 15:09:06,157 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1416264695865_0007_01
2014-11-17 15:09:06,436 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: 
org.apache.hadoop.http.HttpConfig.setPolicy(Lorg/apache/hadoop/http/HttpConfig$Policy;)V
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1364)
2014-11-17 15:09:06,439 INFO [Thread-1] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. 
Signaling RMCommunicator and JobHistoryEventHandler.
{code}

The two exceptions are actually the same problem, but using the old client 
prevents it happening during app submission. Will file a separate Jira for it.

> Compatibility validation between YARN 2.2/2.4 and 2.6
> -
>
> Key: YARN-2879
> URL: https://issues.apache.org/jira/browse/YARN-2879
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Recently, I did some simple backward compatibility experiments. Bascially, 
> I've taken the following 2 steps:
> 1. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *new* Hadoop (2.6) client.
> 2. Deploy the application (MR and DistributedShell) that is compiled against 
> *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
> submitted via *old* Hadoop (2.2/2.4) client that comes with the app.
> I've tried these 2 steps on both insecure and secure cluster. Here's a short 
> summary:
> || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || 
> MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
> | Insecure | New Client | OK | OK | Client Incompatible | Client Incomp

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-11-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218381#comment-14218381
 ] 

Wangda Tan commented on YARN-1963:
--

[~sunilg],
Thanks for reply,

bq. All we need a YarnClient implementation for taking this config and setting 
to ApplicationSubmissionContext. ( Something similar to queue name which this 
app is submitted to ).
Yes, that will be helpful, I wanna make sure they're not in YarnClient now? 
(including the queue). I didn't see any related code in YarnClient

bq. The idea sounds good. The reason for specifying each label needed for a 
queue is because admin can specify the labels applicable for a queue. With high 
priority, we may always end up having default acceptance of lower priorities. 
How do you feel about having this as a range "low-high"
Instead of having low-high range, I'd prefer highest + default priority. Admin 
can specify highest priority for queue/user, and default priority for queue/user

bq. I have a use case scenario here. There are few applications running in a 
queue from 4 different users (sub...
I understood the use case here, but I think maybe an easier way is not change 
the definition of user limit. Like having preemption mechanism to support 
higher priority applications take resource from lower priority applications, 
etc. Divide user limit by priority will add extra complexity both in 
implementation and configuration.

bq.  I suggest to add preemption within queue considering priority. ... +1. 
Already filed a subjira for this.
The preemption I mentioned here is not YARN-2009, is to support the previous 
use case you mentioned, we can keep user-limit as-is, and enforce higher 
priority application can get resource, that should be possible :)

Thanks,
Wangda

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
> Attachments: YARN Application Priorities Design.pdf
>
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2879:
-

 Summary: Compatibility validation between YARN 2.2/2.4 and 2.6
 Key: YARN-2879
 URL: https://issues.apache.org/jira/browse/YARN-2879
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new shuffle handler version.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2014-11-19 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2729:

Attachment: YARN-2729.20141120-1.patch

Updating patch with [~wangda]'s review comments

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-11-19 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495.20141119-1.patch

Updating patch With [~wangda] 's review comments... 

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml or using script 
> suggested by [~aw])
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218254#comment-14218254
 ] 

Hadoop QA commented on YARN-2375:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682439/YARN-2375.patch
  against trunk revision 5bd048e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5879//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5879//console

This message is automatically generated.

> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
> Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch
>
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-11-19 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2356:
--
Attachment: 0001-YARN-2356.patch

Thank you [~jianhe] , [~devaraj.k] for the comments.

I have updated the patch as per the comments.

However I have a point to mention regarding below comment

bq. can we return the exitCode directly from printXXXReport() methods
I could see that this return of exit code from each of the printXXXReport() was 
causing nested if in the caller side, and was becoming less readable. Also 
*killApplication* is already rethrowing exception and handling similar way.. 
Kindly share your thoughts on this.

> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-2356.patch, Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76)
> Cause

[jira] [Commented] (YARN-2299) inconsistency at identifying node

2014-11-19 Thread Bruno Alexandre Rosa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218207#comment-14218207
 ] 

Bruno Alexandre Rosa commented on YARN-2299:


Which*

> inconsistency at identifying node
> -
>
> Key: YARN-2299
> URL: https://issues.apache.org/jira/browse/YARN-2299
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
>
> If port of "yarn.nodemanager.address" is not specified at NM, NM will choose 
> random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
> and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", 
> "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI 
> for a while, and after host:port1 expiration, we get host:port1 in "Lost 
> Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead 
> again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in 
> "Active Nodes" nor in  "Lost Nodes".
> Another case, two NM is running on same host(miniYarnCluster or other test 
> purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI.
> In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of 
> nodes we expected.
> The root cause is due to inconsistency at how we think two Nodes are 
> identical.
> When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
> contains port. Two nodes with same host but different port are thought to be 
> different node.
> But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
> use host. Two nodes with same host but different port are thought to 
> identical.
> To fix the inconsistency, we should differentiate below 2 cases and be 
> consistent for both of them:
>  - intentionally multiple NMs per host
>  - NM instances one after another on same host
> Two possible solutions:
> 1) Introduce a boolean config like "one-node-per-host"(default as "true"), 
> and use host to differentiate nodes on RM if it's true.
> 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. 
>  In this sutiation, NM instances one after another on same host will have 
> same NodeId, while intentionally multiple NMs per host will have different 
> NodeId.
> Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2299) inconsistency at identifying node

2014-11-19 Thread Bruno Alexandre Rosa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218195#comment-14218195
 ] 

Bruno Alexandre Rosa commented on YARN-2299:


What are the affected versions?

> inconsistency at identifying node
> -
>
> Key: YARN-2299
> URL: https://issues.apache.org/jira/browse/YARN-2299
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
>
> If port of "yarn.nodemanager.address" is not specified at NM, NM will choose 
> random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
> and then restarted within "yarn.nm.liveness-monitor.expiry-interval-ms", 
> "host:port1" and "host:port2" will both be present in "Active Nodes" on WebUI 
> for a while, and after host:port1 expiration, we get host:port1 in "Lost 
> Nodes" and host:port2 in "Active Nodes". If the NM is ungracefully dead 
> again, we get only host:port1 in "Lost Nodes". "host:port2" is neither in 
> "Active Nodes" nor in  "Lost Nodes".
> Another case, two NM is running on same host(miniYarnCluster or other test 
> purpose), if both of them are lost, we get only one "Lost Nodes" in WebUI.
> In both case, sum of "Active Nodes" and "Lost Nodes" is not the number of 
> nodes we expected.
> The root cause is due to inconsistency at how we think two Nodes are 
> identical.
> When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
> contains port. Two nodes with same host but different port are thought to be 
> different node.
> But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
> use host. Two nodes with same host but different port are thought to 
> identical.
> To fix the inconsistency, we should differentiate below 2 cases and be 
> consistent for both of them:
>  - intentionally multiple NMs per host
>  - NM instances one after another on same host
> Two possible solutions:
> 1) Introduce a boolean config like "one-node-per-host"(default as "true"), 
> and use host to differentiate nodes on RM if it's true.
> 2) Make it mandatory to have valid port in "yarn.nodemanager.address" config. 
>  In this sutiation, NM instances one after another on same host will have 
> same NodeId, while intentionally multiple NMs per host will have different 
> NodeId.
> Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"

2014-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218171#comment-14218171
 ] 

Hadoop QA commented on YARN-2865:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682428/YARN-2865.1.patch
  against trunk revision 5bd048e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5878//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5878//console

This message is automatically generated.

> Application recovery continuously fails with "Application with id already 
> present. Cannot duplicate"
> 
>
> Key: YARN-2865
> URL: https://issues.apache.org/jira/browse/YARN-2865
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch
>
>
> YARN-2588 handles exception thrown while transitioningToActive and reset 
> activeServices. But it misses out clearing RMcontext apps/nodes details and 
> ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5

2014-11-19 Thread Tim Robertson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218160#comment-14218160
 ] 

Tim Robertson commented on YARN-2875:
-

Sadly no.  

It is used in the 
[ContainerLogAppender|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java#L37]
 and 
[ContainerRollingLogAppender|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java#L34].

I tried to remove it and compile using the [log4j-over-slf4j v1.7.7 
bridge|http://search.maven.org/#artifactdetails%7Corg.slf4j%7Clog4j-over-slf4j%7C1.7.7%7Cjar]
 but that fails because the SLF4J classes are not the same API.  For example 
[the SLF4J RollingFileAppender| 
https://github.com/qos-ch/slf4j/blob/master/log4j-over-slf4j/src/main/java/org/apache/log4j/RollingFileAppender.java]
 does not implement  methods like setFile(), setAppend() etc.  The build will 
fail with the following:

{code}
[INFO] -
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[41,6]
 error: cannot find symbol
[ERROR]   symbol:   method setFile(String)
  location: class ContainerRollingLogAppender
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[42,6]
 error: cannot find symbol
[ERROR]   symbol:   method setAppend(boolean)
  location: class ContainerRollingLogAppender
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[43,11]
 error: cannot find symbol
[ERROR]   symbol: method activateOptions()
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[38,2]
 error: method does not override or implement a method from a supertype
[ERROR] 
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[49,8]
 error: cannot find symbol
[ERROR]   symbol:   variable qw
  location: class ContainerRollingLogAppender
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerRollingLogAppender.java:[50,6]
 error: cannot find symbol
[ERROR]   symbol:   variable qw
  location: class ContainerRollingLogAppender
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[37,7]
 error: no suitable constructor found for FileAppender()
[ERROR] constructor 
FileAppender.FileAppender(Layout,String,boolean,boolean,int) is not applicable
  (actual and formal argument lists differ in length)
constructor FileAppender.FileAppender(Layout,String,boolean) is not 
applicable
  (actual and formal argument lists differ in length)
constructor FileAppender.FileAppender(Layout,String) is not applicable
  (actual and formal argument lists differ in length)
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[52,6]
 error: cannot find symbol
[ERROR]   symbol:   method setFile(String)
  location: class ContainerLogAppender
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[53,6]
 error: cannot find symbol
[ERROR]   symbol:   method setAppend(boolean)
  location: class ContainerLogAppender
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[65,13]
 error: cannot find symbol
[ERROR]   symbol: method append(LoggingEvent)
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[58,2]
 error: method does not override or implement a method from a supertype
[ERROR] 
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[77,8]
 error: cannot find symbol
[ERROR]   symbol:   variable qw
  location: class ContainerLogAppender
/Users/tim/dev/git/hadoop/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java:[78,6]
 error: cannot find symbol
[ERROR] 

[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-19 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2375:

Attachment: YARN-2375.patch

Attaching updated patch

> Allow enabling/disabling timeline server per framework
> --
>
> Key: YARN-2375
> URL: https://issues.apache.org/jira/browse/YARN-2375
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Mit Desai
> Attachments: YARN-2375.patch, YARN-2375.patch, YARN-2375.patch
>
>
> This JIRA is to remove the ats enabled flag check within the 
> TimelineClientImpl. Example where this fails is below.
> While running secure timeline server with ats flag set to disabled on 
> resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"

2014-11-19 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218044#comment-14218044
 ] 

Rohith commented on YARN-2865:
--

Attached patch, the changes from previous patch are
1. Karthik comment fixed. Adding comment for RMActiveServiceContext and making 
@private and @unstable annotations.
2. Jian He comment fixed. I use rmcontext only to set services.

Please review the patch.

> Application recovery continuously fails with "Application with id already 
> present. Cannot duplicate"
> 
>
> Key: YARN-2865
> URL: https://issues.apache.org/jira/browse/YARN-2865
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch
>
>
> YARN-2588 handles exception thrown while transitioningToActive and reset 
> activeServices. But it misses out clearing RMcontext apps/nodes details and 
> ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"

2014-11-19 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2865:
-
Attachment: YARN-2865.1.patch

> Application recovery continuously fails with "Application with id already 
> present. Cannot duplicate"
> 
>
> Key: YARN-2865
> URL: https://issues.apache.org/jira/browse/YARN-2865
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch
>
>
> YARN-2588 handles exception thrown while transitioningToActive and reset 
> activeServices. But it misses out clearing RMcontext apps/nodes details and 
> ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218014#comment-14218014
 ] 

Hudson commented on YARN-2870:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #10 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/10/])
YARN-2870. Updated the command to run the timeline server it the document. 
Contributed by Masatake Iwasaki. (zjshen: rev 
ef38fb9758f230c3021e70b749d7a11f8bac03f5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm


> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2157) Document YARN metrics

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218017#comment-14218017
 ] 

Hudson commented on YARN-2157:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #10 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/10/])
YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA 
(jianhe: rev 90a968d6757511b6d89538516db0e699129d854c)
* hadoop-yarn-project/CHANGES.txt
* hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm


> Document YARN metrics
> -
>
> Key: YARN-2157
> URL: https://issues.apache.org/jira/browse/YARN-2157
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Fix For: 2.7.0
>
> Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch
>
>
> YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting

2014-11-19 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2878:

 Target Version/s: 2.6.1
Affects Version/s: 2.6.0

> Fix DockerContainerExecutor.apt.vm formatting
> -
>
> Key: YARN-2878
> URL: https://issues.apache.org/jira/browse/YARN-2878
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-1964-docs.patch
>
>
> The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting

2014-11-19 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218008#comment-14218008
 ] 

Akira AJISAKA commented on YARN-2878:
-

Applied the patch and compiled the doc. The doc looks to me, +1 (non-binding).

> Fix DockerContainerExecutor.apt.vm formatting
> -
>
> Key: YARN-2878
> URL: https://issues.apache.org/jira/browse/YARN-2878
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-1964-docs.patch
>
>
> The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2157) Document YARN metrics

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217998#comment-14217998
 ] 

Hudson commented on YARN-2157:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1962 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1962/])
YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA 
(jianhe: rev 90a968d6757511b6d89538516db0e699129d854c)
* hadoop-yarn-project/CHANGES.txt
* hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm


> Document YARN metrics
> -
>
> Key: YARN-2157
> URL: https://issues.apache.org/jira/browse/YARN-2157
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Fix For: 2.7.0
>
> Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch
>
>
> YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217995#comment-14217995
 ] 

Hudson commented on YARN-2870:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1962 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1962/])
YARN-2870. Updated the command to run the timeline server it the document. 
Contributed by Masatake Iwasaki. (zjshen: rev 
ef38fb9758f230c3021e70b749d7a11f8bac03f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2878) Fix DockerContainerExecutor.apt.vm formatting

2014-11-19 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2878:

Component/s: documentation

> Fix DockerContainerExecutor.apt.vm formatting
> -
>
> Key: YARN-2878
> URL: https://issues.apache.org/jira/browse/YARN-2878
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-1964-docs.patch
>
>
> The formatting on DockerContainerExecutor.apt.vm is off. Needs correction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217946#comment-14217946
 ] 

Hudson commented on YARN-2870:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1938 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1938/])
YARN-2870. Updated the command to run the timeline server it the document. 
Contributed by Masatake Iwasaki. (zjshen: rev 
ef38fb9758f230c3021e70b749d7a11f8bac03f5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm


> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2157) Document YARN metrics

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217949#comment-14217949
 ] 

Hudson commented on YARN-2157:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1938 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1938/])
YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA 
(jianhe: rev 90a968d6757511b6d89538516db0e699129d854c)
* hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Document YARN metrics
> -
>
> Key: YARN-2157
> URL: https://issues.apache.org/jira/browse/YARN-2157
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Fix For: 2.7.0
>
> Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch
>
>
> YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2157) Document YARN metrics

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217931#comment-14217931
 ] 

Hudson commented on YARN-2157:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #10 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/10/])
YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA 
(jianhe: rev 90a968d6757511b6d89538516db0e699129d854c)
* hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Document YARN metrics
> -
>
> Key: YARN-2157
> URL: https://issues.apache.org/jira/browse/YARN-2157
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Fix For: 2.7.0
>
> Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch
>
>
> YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217928#comment-14217928
 ] 

Hudson commented on YARN-2870:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #10 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/10/])
YARN-2870. Updated the command to run the timeline server it the document. 
Contributed by Masatake Iwasaki. (zjshen: rev 
ef38fb9758f230c3021e70b749d7a11f8bac03f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217720#comment-14217720
 ] 

Hudson commented on YARN-2870:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #748 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/748/])
YARN-2870. Updated the command to run the timeline server it the document. 
Contributed by Masatake Iwasaki. (zjshen: rev 
ef38fb9758f230c3021e70b749d7a11f8bac03f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2157) Document YARN metrics

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217723#comment-14217723
 ] 

Hudson commented on YARN-2157:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #748 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/748/])
YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA 
(jianhe: rev 90a968d6757511b6d89538516db0e699129d854c)
* hadoop-yarn-project/CHANGES.txt
* hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm


> Document YARN metrics
> -
>
> Key: YARN-2157
> URL: https://issues.apache.org/jira/browse/YARN-2157
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Fix For: 2.7.0
>
> Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch
>
>
> YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2157) Document YARN metrics

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217713#comment-14217713
 ] 

Hudson commented on YARN-2157:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #10 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/10/])
YARN-2157. Added YARN metrics in the documentaion. Contributed by Akira AJISAKA 
(jianhe: rev 90a968d6757511b6d89538516db0e699129d854c)
* hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Document YARN metrics
> -
>
> Key: YARN-2157
> URL: https://issues.apache.org/jira/browse/YARN-2157
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Fix For: 2.7.0
>
> Attachments: YARN-2157.2.patch, YARN-2157.3.patch, YARN-2157.patch
>
>
> YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2870) Update examples in document of Timeline Server

2014-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217710#comment-14217710
 ] 

Hudson commented on YARN-2870:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #10 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/10/])
YARN-2870. Updated the command to run the timeline server it the document. 
Contributed by Masatake Iwasaki. (zjshen: rev 
ef38fb9758f230c3021e70b749d7a11f8bac03f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/TimelineServer.apt.vm
* hadoop-yarn-project/CHANGES.txt


> Update examples in document of Timeline Server
> --
>
> Key: YARN-2870
> URL: https://issues.apache.org/jira/browse/YARN-2870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, timelineserver
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: YARN-2870.1.patch
>
>
> YARN-1982 renamed historyserver to timelineserver but there is still 
> deprecated name in docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2356) yarn status command for non-existent application/application attempt/container is too verbose

2014-11-19 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217706#comment-14217706
 ] 

Devaraj K commented on YARN-2356:
-

Sorry for coming late here. Thanks [~sunilg] for the patch and thanks [~jianhe] 
for review. Overall the patch looks good. In addition to [~jianhe] comment, I 
see these two observations.

1. Instead of rethrowing and catching the exception for exitCode determination, 
can we return the exitCode directly from printXXXReport() methods? 
2. In all the newly added tests, I think no need to catch the exception and do 
Assert.fail() explicitly, JUnit will fail those when exception arises.


> yarn status command for non-existent application/application 
> attempt/container is too verbose 
> --
>
> Key: YARN-2356
> URL: https://issues.apache.org/jira/browse/YARN-2356
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Attachments: Yarn-2356.1.patch
>
>
> *yarn application -status* or *applicationattempt -status* or *container 
> status* commands can suppress exception such as ApplicationNotFound, 
> ApplicationAttemptNotFound and ContainerNotFound for non-existent entries in 
> RM or History Server. 
> For example, below exception can be suppressed better
> sunildev@host-a:~/hadoop/hadoop/bin> ./yarn application -status 
> application_1402668848165_0015
> No GC_PROFILE is given. Defaults to medium.
> 14/07/25 16:21:45 INFO client.RMProxy: Connecting to ResourceManager at 
> /10.18.40.77:45022
> Exception in thread "main" 
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1402668848165_0015' doesn't exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:285)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:607)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1626)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at $Proxy12.getApplicationReport(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.printApplicationReport(ApplicationCLI.java:428)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:153)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76)
> Caused by: 
> 

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-11-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217672#comment-14217672
 ] 

Sunil G commented on YARN-1963:
---

Hi Wangda, 

Thank for sharing your comments.

bq. Does this means, any YARN application doesn't need change a line of their 
code,
yarn.app.priority can be passed from client side. And if client can set the 
priority value  to ApplicationSubmissionContext which is received from this 
config, then RM can get the same. All we need a YarnClient implementation for 
taking this config and setting to ApplicationSubmissionContext. ( Something 
similar to queue name which this app is submitted to ).

bq. Specify only highest priority for queue and user
The idea sounds good. The reason for specifying each label needed for a queue 
is because admin can specify the labels applicable for a queue. With high 
priority, we may always end up having default acceptance of lower priorities. 
How do you feel about having this as a range "low-high" 
{noformat}
cluster labels {very_high, high, medium, low}
yarn.scheduler.root..priority_label=low-high
yarn.scheduler.capacity.root..high.acl=user1,user2
yarn.scheduler.capacity.root..low.acl=user3,user4
{noformat}

This was the intention. Please share your thoughts [~vinodkv] [~gp.leftnoteasy] 
 

bq. I think we shouldn't consider user limit within priority level
I have a use case scenario here. There are few applications running in a queue 
from 4 different users (submitted to priority level low) and user-limit factor 
is 20. 5th user has ACL for submitting high priority applications. Because of 
user-limit, he can get only 20% maximum for his high priority apps. This high 
priority apps submitted by user5 may need more resource which intern will be 
rejected by user-limit check. How do you feel this use case?
 
bq. I suggest to add preemption within queue considering priority.
+1. Already filed a subjira for this.

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
> Attachments: YARN Application Priorities Design.pdf
>
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-11-19 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217671#comment-14217671
 ] 

Naganarasimha G R commented on YARN-2301:
-

Hi [~jianhe],
Thanks for reviewing, but required more clarifications :
 bq. we can just use containerReport.getFinishTime(), as it internally is 
checking “>0” already.
 This modification is to support for the 3 issue which you mentioned {{3. 
finish-time is 0 if container is not yet finished. May be "N/A"}} and dint get 
where exactly ">0" is being checked internally as there are no checks in 
PBImpl.  Can you please provide more precisely where this check is done 
internally?
 bq. the scheme could be https also, we should use 
WebAppUtils#getHttpSchemePrefix
 Due to the following reasons i kept scheme hard coded to http
 1. We get the containers HTTP address only and to that we appending the scheme 
{{WebAppUtils.getRunningLogURL(container.getNodeHttpAddress()}}. so 
irrespective of what scheme we set, the port will be node's http port where 
this container ran. so it would not be ideal to set scheme as HTTPS and node's 
http port. And if we need to correct this then we need to enforce 
Container.newInstance to take https url also 
 which will impact lot of places 
 2. WebAppUtils#getHttpSchemePrefix requires configuration object, so as the 
reference is not available in RMContainerImpl, i need to create new 
YARNConfigurations RMContainerImpl constructor and keep it. ??may be trivial 
issue??

so kept the changes simple. Please provide your opinion for the same

> Improve yarn container command
> --
>
> Key: YARN-2301
> URL: https://issues.apache.org/jira/browse/YARN-2301
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Naganarasimha G R
>  Labels: usability
> Attachments: YARN-2301.01.patch, YARN-2301.03.patch, YARN-2303.patch
>
>
> While running yarn container -list  command, some 
> observations:
> 1) the scheme (e.g. http/https  ) before LOG-URL is missing
> 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
> print as time format.
> 3) finish-time is 0 if container is not yet finished. May be "N/A"
> 4) May have an option to run as yarn container -list  OR  yarn 
> application -list-containers  also.  
> As attempt Id is not shown on console, this is easier for user to just copy 
> the appId and run it, may  also be useful for container-preserving AM 
> restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5

2014-11-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217669#comment-14217669
 ] 

Steve Loughran commented on YARN-2875:
--

Linking to HADOOP-11317 to cover project-wide use.
I don't think yarn-common needs to explicitly declare a dependency on log4j, at 
least outside the test run. If you comment out that dependency —does everything 
still build?

> Bump SLF4J to 1.7.7 from 1.7.5 
> ---
>
> Key: YARN-2875
> URL: https://issues.apache.org/jira/browse/YARN-2875
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tim Robertson
>Priority: Minor
>
> hadoop-yarn-common [uses log4j 
> directly|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml#L167]
>  and when trying to redirect that through an SLF4J bridge version 1.7.5 has 
> issues, due to use of AppenderSkeleton which is missing in log4j-over-slf4j 
> version 1.7.5.
> This is documented on the [1.7.6 release 
> notes|http://www.slf4j.org/news.html] but 1.7.7 should be suitable.
> This is applicable to all the projects using Hadoop motherpom, but Yarn 
> appears to be bringing Log4J in, rather than coding to the SLF4J API.
> The issue shows in the logs as follows in Yarn MR apps, which is painful to 
> diagnose.
> {code}
> WARN  [2014-11-18 09:58:06,390+0100] [main] 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Caught exception in 
> callback postStart
> java.lang.reflect.InvocationTargetException: null
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[na:1.7.0_71]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> ~[na:1.7.0_71]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.7.0_71]
>   at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
>  ~[job.jar:0.22-SNAPSHOT]
>   at com.sun.proxy.$Proxy2.postStart(Unknown Source) [na:na]
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
>  [job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:157)
>  [job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54)
>  [job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
>  [job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1036)
>  [job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> [job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1478) 
> [job.jar:0.22-SNAPSHOT]
>   at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_71]
>   at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>  [job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474)
>  [job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) 
> [job.jar:0.22-SNAPSHOT]
> Caused by: java.lang.IncompatibleClassChangeError: Implementing class
>   at java.lang.ClassLoader.defineClass1(Native Method) ~[na:1.7.0_71]
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:800) 
> ~[na:1.7.0_71]
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 
> ~[na:1.7.0_71]
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 
> ~[na:1.7.0_71]
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 
> ~[na:1.7.0_71]
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ~[na:1.7.0_71]
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_71]
>   at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_71]
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 
> ~[na:1.7.0_71]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_71]
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 
> ~[na:1.7.0_71]
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_71]
>   at 
> org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:183)
>  ~[job.jar:0.22-SNAPSHOT]
>   at 
> org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:100) 
> ~[job.jar:0.22-S

[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217668#comment-14217668
 ] 

Steve Loughran commented on YARN-2877:
--

(ignore that comment, was for YARN-2875)

> Extend YARN to support distributed scheduling
> -
>
> Key: YARN-2877
> URL: https://issues.apache.org/jira/browse/YARN-2877
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling

2014-11-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217667#comment-14217667
 ] 

Steve Loughran commented on YARN-2877:
--

Linking to HADOOP-11317 to cover project-wide use.

I don't think yarn-common needs to explicitly declare a dependency on log4j, at 
least outside the test run. If you comment out that dependency —does everything 
still build?

> Extend YARN to support distributed scheduling
> -
>
> Key: YARN-2877
> URL: https://issues.apache.org/jira/browse/YARN-2877
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Sriram Rao
>
> This is an umbrella JIRA that proposes to extend YARN to support distributed 
> scheduling.  Briefly, some of the motivations for distributed scheduling are 
> the following:
> 1. Improve cluster utilization by opportunistically executing tasks otherwise 
> idle resources on individual machines.
> 2. Reduce allocation latency.  Tasks where the scheduling time dominates 
> (i.e., task execution time is much less compared to the time required for 
> obtaining a container from the RM).
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2875) Bump SLF4J to 1.7.7 from 1.7.5

2014-11-19 Thread Tim Robertson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated YARN-2875:

Description: 
hadoop-yarn-common [uses log4j 
directly|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml#L167]
 and when trying to redirect that through an SLF4J bridge version 1.7.5 has 
issues, due to use of AppenderSkeleton which is missing in log4j-over-slf4j 
version 1.7.5.

This is documented on the [1.7.6 release notes|http://www.slf4j.org/news.html] 
but 1.7.7 should be suitable.

This is applicable to all the projects using Hadoop motherpom, but Yarn appears 
to be bringing Log4J in, rather than coding to the SLF4J API.

The issue shows in the logs as follows in Yarn MR apps, which is painful to 
diagnose.
{code}
WARN  [2014-11-18 09:58:06,390+0100] [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Caught exception in callback 
postStart
java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.7.0_71]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
~[na:1.7.0_71]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.7.0_71]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:290)
 ~[job.jar:0.22-SNAPSHOT]
at com.sun.proxy.$Proxy2.postStart(Unknown Source) [na:na]
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:185)
 [job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:157)
 [job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54)
 [job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
 [job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1036)
 [job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
[job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1478) 
[job.jar:0.22-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) 
[na:1.7.0_71]
at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 [job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474)
 [job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407) 
[job.jar:0.22-SNAPSHOT]
Caused by: java.lang.IncompatibleClassChangeError: Implementing class
at java.lang.ClassLoader.defineClass1(Native Method) ~[na:1.7.0_71]
at java.lang.ClassLoader.defineClass(ClassLoader.java:800) 
~[na:1.7.0_71]
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 
~[na:1.7.0_71]
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 
~[na:1.7.0_71]
at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 
~[na:1.7.0_71]
at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ~[na:1.7.0_71]
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_71]
at java.security.AccessController.doPrivileged(Native Method) 
[na:1.7.0_71]
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) 
~[na:1.7.0_71]
at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_71]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) 
~[na:1.7.0_71]
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_71]
at 
org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:183)
 ~[job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:100) 
~[job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
 ~[job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
 ~[job.jar:0.22-SNAPSHOT]
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
 ~[job.jar:0.22-SNAPSHOT]
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
 ~[na:1.7.0_71]
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
 ~[na:1

[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-11-19 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.2.patch

Go ahead and allow cores to be part of the am resource limit...

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2637.0.patch, YARN-2637.1.patch, YARN-2637.2.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)