[jira] [Commented] (YARN-4582) Label-related invalid resource request exception should be able to properly handled by application

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093513#comment-15093513
 ] 

Hadoop QA commented on YARN-4582:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 8s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
57s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 36s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 5s 
{color} | {color:red} Patch generated 3 new checkstyle issues in root (total 
was 132, now 135). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
57s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 2s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 39s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 31s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 12s 
{color} | {color:green} hadoop-mapreduce-client-app in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:gre

[jira] [Commented] (YARN-4582) Label-related invalid resource request exception should be able to properly handled by application

2016-01-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093524#comment-15093524
 ] 

Bibin A Chundatt commented on YARN-4582:


Thank you for review and commit [~leftnoteasy] 

> Label-related invalid resource request exception should be able to properly 
> handled by application
> --
>
> Key: YARN-4582
> URL: https://issues.apache.org/jira/browse/YARN-4582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-MAPREDUCE-6476.patch, 0002-MAPREDUCE-6476.patch
>
>
> Steps to reproduce
> ===
> Submit  mapreduce job 
> # map to label x
> # reduce to label y
> Precondition
> # Queue b to which reduce is submitted not having access to label specified
> *Impact*
> # Jobs fail only of the RM-AM comunication timeout
> (About 10 mins i think)
> Should kill the job immediately when InvalidResourceException is received on 
> {{RMContainerRequestor#makeRemoteRequest}}
> *Logs*
> {noformat}
> 2015-09-11 16:44:30,116 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, queue=b1 doesn't have permission to access all labels in 
> resource request. labelExpression of resource request=1. Queue labels=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:457)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2230)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2224)
>   at sun.reflect.GeneratedConstructorAccessor39.newInstance(Unknown 
> Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:251)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>   at com.sun.proxy.$Proxy37.allocate(Unknown Source)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:203)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:694)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:263)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:281)
>   at java.lan

[jira] [Commented] (YARN-4538) QueueMetrics pending cores and memory metrics wrong

2016-01-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093525#comment-15093525
 ] 

Bibin A Chundatt commented on YARN-4538:


Test case failures are not related to patch uploaded . Please review patch 

> QueueMetrics pending  cores and memory metrics wrong
> 
>
> Key: YARN-4538
> URL: https://issues.apache.org/jira/browse/YARN-4538
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4538.patch, 0002-YARN-4538.patch, 
> 0003-YARN-4538.patch
>
>
> Submit 2 application to default queue 
> Check queue metrics for pending cores and memory
> {noformat}
> List allQueues = client.getChildQueueInfos("root");
> for (QueueInfo queueInfo : allQueues) {
>   QueueStatistics quastats = queueInfo.getQueueStatistics();
>   System.out.println(quastats.getPendingVCores());
>   System.out.println(quastats.getPendingMemoryMB());
> }
> {noformat}
> *Output :*
> -20
> -20480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Johan Gustavsson (JIRA)
Johan Gustavsson created YARN-4583:
--

 Summary: Resource manager should purge generic history data when 
using FileSystemApplicationHistoryStore
 Key: YARN-4583
 URL: https://issues.apache.org/jira/browse/YARN-4583
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.1, 2.7.0, 2.4.1
Reporter: Johan Gustavsson
Assignee: Johan Gustavsson


Init's current state when enabling 
`yarn.timeline-service.generic-application-history.enabled` and setting 
`yarn.timeline-service.generic-application-history.store-class` to 
`org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
 files keep building up in dir until it reaches max files for dir. There should 
be a way to set the RM to purge these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Johan Gustavsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson updated YARN-4583:
---
Attachment: YARN-4583.patch

> Resource manager should purge generic history data when using 
> FileSystemApplicationHistoryStore
> ---
>
> Key: YARN-4583
> URL: https://issues.apache.org/jira/browse/YARN-4583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1, 2.7.0, 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Attachments: YARN-4583.patch
>
>
> Init's current state when enabling 
> `yarn.timeline-service.generic-application-history.enabled` and setting 
> `yarn.timeline-service.generic-application-history.store-class` to 
> `org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
>  files keep building up in dir until it reaches max files for dir. There 
> should be a way to set the RM to purge these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Johan Gustavsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson updated YARN-4583:
---
Description: In it's current state when enabling 
`yarn.timeline-service.generic-application-history.enabled` and setting 
`yarn.timeline-service.generic-application-history.store-class` to 
`org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
 files keep building up in dir until it reaches max files for dir. There should 
be a way to set the RM to purge these files.  (was: Init's current state when 
enabling `yarn.timeline-service.generic-application-history.enabled` and 
setting `yarn.timeline-service.generic-application-history.store-class` to 
`org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
 files keep building up in dir until it reaches max files for dir. There should 
be a way to set the RM to purge these files.)

> Resource manager should purge generic history data when using 
> FileSystemApplicationHistoryStore
> ---
>
> Key: YARN-4583
> URL: https://issues.apache.org/jira/browse/YARN-4583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1, 2.7.0, 2.7.1
>Reporter: Johan Gustavsson
> Attachments: YARN-4583.patch
>
>
> In it's current state when enabling 
> `yarn.timeline-service.generic-application-history.enabled` and setting 
> `yarn.timeline-service.generic-application-history.store-class` to 
> `org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
>  files keep building up in dir until it reaches max files for dir. There 
> should be a way to set the RM to purge these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Johan Gustavsson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093563#comment-15093563
 ] 

Johan Gustavsson commented on YARN-4583:


Added a path pending review

> Resource manager should purge generic history data when using 
> FileSystemApplicationHistoryStore
> ---
>
> Key: YARN-4583
> URL: https://issues.apache.org/jira/browse/YARN-4583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1, 2.7.0, 2.7.1
>Reporter: Johan Gustavsson
> Attachments: YARN-4583.patch
>
>
> In it's current state when enabling 
> `yarn.timeline-service.generic-application-history.enabled` and setting 
> `yarn.timeline-service.generic-application-history.store-class` to 
> `org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
>  files keep building up in dir until it reaches max files for dir. There 
> should be a way to set the RM to purge these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Johan Gustavsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson updated YARN-4583:
---
Assignee: (was: Johan Gustavsson)

> Resource manager should purge generic history data when using 
> FileSystemApplicationHistoryStore
> ---
>
> Key: YARN-4583
> URL: https://issues.apache.org/jira/browse/YARN-4583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1, 2.7.0, 2.7.1
>Reporter: Johan Gustavsson
> Attachments: YARN-4583.patch
>
>
> Init's current state when enabling 
> `yarn.timeline-service.generic-application-history.enabled` and setting 
> `yarn.timeline-service.generic-application-history.store-class` to 
> `org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
>  files keep building up in dir until it reaches max files for dir. There 
> should be a way to set the RM to purge these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4584) RM recovery failure when AM is preempted many times

2016-01-12 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4584:
--

 Summary: RM recovery failure when AM is preempted many times
 Key: YARN-4584
 URL: https://issues.apache.org/jira/browse/YARN-4584
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical


Due resource limit in queue AM got prempted about 20 times 
On RM restart RM fails to restart

{noformat}
2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
noteFailure java.lang.NullPointerException
2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: Service 
RMActiveServices failed in state STARTED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
Service: RMActiveServices entered state STOPPED
2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
RMActiveServices: stopping services, size=16

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4584) RM start failure when AM is preempted many times

2016-01-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4584:
---
Summary: RM start failure when AM is preempted many times  (was: RM 
recovery failure when AM is preempted many times)

> RM start failure when AM is preempted many times
> 
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Due resource limit in queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4584) RM start failure when AM is preempted many times

2016-01-12 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093582#comment-15093582
 ] 

Jun Gong commented on YARN-4584:


[~bibinchundatt] Thanks for reporting the issue. It seems the attempt's data is 
missed?  If yes, the patch for YARN-4497 might could solve the problem.

> RM start failure when AM is preempted many times
> 
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Due resource limit in queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4584) RM start failure when AM is preempted many times

2016-01-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-4584.

Resolution: Duplicate

[~hex108] Thank you . Closing as duplicate

> RM start failure when AM is preempted many times
> 
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Due resource limit in queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4584) RM start failure when AM is preempted many times

2016-01-12 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093626#comment-15093626
 ] 

Jun Gong commented on YARN-4584:


[~bibinchundatt] Thanks for confirming it.

> RM start failure when AM is preempted many times
> 
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Due resource limit in queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4584) RM start failure when AM is preempted many times

2016-01-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093640#comment-15093640
 ] 

Bibin A Chundatt commented on YARN-4584:


[~hex108]

Am attempts before recovery was about 31 and max attempts was 3. So on recovery 
1-28 got removed from appstore.
so those attempts where causing NPE and RM recovery was failing.

In HA node of the RM will be able to become Active

> RM start failure when AM is preempted many times
> 
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Due resource limit in queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4584) RM start failure when AM is preempted many times

2016-01-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093641#comment-15093641
 ] 

Bibin A Chundatt commented on YARN-4584:


In HA none of the RM will be able to become Active

> RM start failure when AM is preempted many times
> 
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Due resource limit in queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2016-01-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4497:
---
Priority: Critical  (was: Major)

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>Priority: Critical
> Attachments: YARN-4497.01.patch
>
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093685#comment-15093685
 ] 

Hadoop QA commented on YARN-4583:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
1s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} Patch generated 14 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 236, now 249). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 1s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 4s {color} 
| {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 30s {color} 
| {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed 
with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
33s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 25s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.applicationhistoryservice.TestFileSystemApplicationHistoryStore
 |
| JDK v1.7.0_91 Timed out junit tests | 
org.apache.hadoop.yarn.

[jira] [Updated] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Johan Gustavsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson updated YARN-4583:
---
Attachment: YARN-4583.001.patch

Fixed formatting according to QA output

> Resource manager should purge generic history data when using 
> FileSystemApplicationHistoryStore
> ---
>
> Key: YARN-4583
> URL: https://issues.apache.org/jira/browse/YARN-4583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1, 2.7.0, 2.7.1
>Reporter: Johan Gustavsson
> Attachments: YARN-4583.001.patch, YARN-4583.patch
>
>
> In it's current state when enabling 
> `yarn.timeline-service.generic-application-history.enabled` and setting 
> `yarn.timeline-service.generic-application-history.store-class` to 
> `org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
>  files keep building up in dir until it reaches max files for dir. There 
> should be a way to set the RM to purge these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093797#comment-15093797
 ] 

Hadoop QA commented on YARN-4583:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 34s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 236, now 235). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 44s {color} 
| {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 34s {color} 
| {color:red} hadoop-yarn-server-applicationhistoryservice in the patch failed 
with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 34s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.applicationhistoryservice.TestFileSystemApplicationHistoryStore
 |
| JDK v1.7.0_91 Timed out junit tests | 
org.apache.hadoop.yar

[jira] [Commented] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource

2016-01-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093819#comment-15093819
 ] 

Bibin A Chundatt commented on YARN-4575:


Patch attached to return {{ApplicationResourceUsageReport}} with reserved 
memory as sum of all partitions.
Please review patch attached

> ApplicationResourceUsageReport should return ALL  reserved resource
> ---
>
> Key: YARN-4575
> URL: https://issues.apache.org/jira/browse/YARN-4575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4575.patch
>
>
> ApplicationResourceUsageReport reserved resource report  is only of default 
> parition should be of all partitions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource

2016-01-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093821#comment-15093821
 ] 

Bibin A Chundatt commented on YARN-4575:


All testcase failures are already handled as part of YARN-4478

> ApplicationResourceUsageReport should return ALL  reserved resource
> ---
>
> Key: YARN-4575
> URL: https://issues.apache.org/jira/browse/YARN-4575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4575.patch
>
>
> ApplicationResourceUsageReport reserved resource report  is only of default 
> parition should be of all partitions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4581) thread leak makes RM crash while RM is recovering

2016-01-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093925#comment-15093925
 ] 

Junping Du commented on YARN-4581:
--

Hi [~sandflee], thanks for reporting the issue and delivering the patch.
Like Naga mentioned above, AHS is already a deprecated feature in community and 
ATS (Application Timeline Service) is a replacement for it since 2.6.0. Do you 
have plan to migrate to ATS instead of AHS?

> thread leak makes RM crash while RM is recovering
> -
>
> Key: YARN-4581
> URL: https://issues.apache.org/jira/browse/YARN-4581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4581.01.patch
>
>
> we enable ApplicationHistoryWriter, and find thousands of  Errors:
> {quote}
> 2016-01-08 03:13:03,441 ERROR 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
>  Error when openning history file of application 
> application_1451878591907_0197
> java.io.IOException: Output file not at zero offset.
> at 
> org.apache.hadoop.io.file.tfile.BCFile$Writer.(BCFile.java:288)
> at org.apache.hadoop.io.file.tfile.TFile$Writer.(TFile.java:288)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.(FileSystemApplicationHistoryStore.java:728)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> and this leads rm crashed:
> {quote}
> 2016-01-08 03:13:08,335 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2033)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1652)
> at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1573)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1603)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1591)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:324)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:324)
> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.(FileSystemApplicationHistoryStore.java:723)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> after serveval failover, rm finish recovering, thousands of hdfs client 
> thread are leaked in rm.
> {quote}
> "Thread-22723" #22893 daemon prio=5 os_prio=0 tid=0x7f75f0346000 
> nid=0x13

[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093946#comment-15093946
 ] 

Jason Lowe commented on YARN-4371:
--

I agree that space delimiter is my preferred choice.  I was just pointing out 
that as it is coded today users can add other options along with the kill 
option that are not supported, or potentially nonsensical like movetoqueue, and 
those other options will be totally ignored which is not ideal.  This doesn't 
mean that we can't use spaces, it just means the code needs to check for 
incompatible or unsupported options being present when it is processing the 
kill option.

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2016-01-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093986#comment-15093986
 ] 

Jason Lowe commented on YARN-4414:
--

+1 committing this.

> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, 
> YARN-4414.1.3.patch, YARN-4414.1.patch, YARN-4414.2.patch, YARN-4414.3.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093971#comment-15093971
 ] 

Sunil G commented on YARN-4371:
---

Thank you [~jlowe] for the input and thank you [~Naganarasimha Garla].

{noformat}
root@sunil-Inspiron-3543:/opt/hadoop/trunk/hadoop-3.0.0-SNAPSHOT/bin# ./yarn 
application -kill application_1452608662572_0002 application_1452608662572_0008 
application_1452608662572_0003

16/01/12 20:02:12 INFO client.RMProxy: Connecting to ResourceManager at 
/127.0.0.1:25001
Application application_1452608662572_0002 has already finished 
Application with id 'application_1452608662572_0008' doesn't exist in RM.
Killing application application_1452608662572_0003
16/01/12 20:02:16 INFO impl.YarnClientImpl: Killed application 
application_1452608662572_0003
{noformat}

Here I have tried to  kill 3 apps (1 running, 1 finished and 1 invalid). As per 
latest patch, we can get the exact error as {{Application with id 
'application_1452608662572_0008' doesn't exist in RM.}}
Will this be fine here?

bq.I was just pointing out that as it is coded today users can add other 
options along with the kill option that are not supported, or potentially 
nonsensical like movetoqueu

Yes I agree with this part, its not vulnerable to have some other co-options 
along with kill. More validation can be added and it can be ensured that 
{{kill}} will be an option which cannot be a sub-options with other commands.


> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093996#comment-15093996
 ] 

Sunil G commented on YARN-4371:
---

Yes, I missed about below code.

{code}
+// If one or all applications are not found, throw back exception
+// to return proper error code.
+if (reportFailure) {
+  throw new ApplicationNotFoundException("Application doesn't exist in 
RM.");
+}
{code}

Existing {{killApplication}} has the capability to throw exception if app s not 
found. And I have kept same code, so we will still print application not found 
exception correctly in console with app Ids. However, there can be cases when 
all apps which are given to kill are not present in RM. So console will print 
the error as we already have it. But we also need to send back a non-zero error 
code. For tht, I am rethrowing exception to return proper error code in a FULL 
error scenario. In partial success cases, we will return 0/

I could return back error code rather than throwing one again. But we need to 
have more if check in below parser code. So the dummy message is only to handle 
errorCode.
{code}
} else if (cliParser.hasOption(KILL_CMD)) {
  if (args.length < 3) {
printUsage(title, opts);
return exitCode;
  }
  try{
killApplication(cliParser.getOptionValues(KILL_CMD));
  } catch (ApplicationNotFoundException e) {
return exitCode;
  }
{code}


> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4109) Exception on RM scheduler page loading with labels

2016-01-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094019#comment-15094019
 ] 

Sunil G commented on YARN-4109:
---

+1. Yes this will be helpful in 2.8 release.

> Exception on RM scheduler page loading with labels
> --
>
> Key: YARN-4109
> URL: https://issues.apache.org/jira/browse/YARN-4109
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4109_1.patch
>
>
> Configure node label and load scheduler Page
> On each reload of the page the below exception gets thrown in logs
> {code}
> 2015-09-03 11:27:08,544 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/scheduler
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:139)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:663)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:615)
>   at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>  

[jira] [Updated] (YARN-3842) NMProxy should retry on NMNotYetReadyException

2016-01-12 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3842:
-
Target Version/s: 2.6.3, 2.7.1  (was: 2.7.1, 2.6.3)
   Fix Version/s: 2.6.4

I committed this to branch-2.6.

> NMProxy should retry on NMNotYetReadyException
> --
>
> Key: YARN-3842
> URL: https://issues.apache.org/jira/browse/YARN-3842
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
>Priority: Critical
> Fix For: 2.7.1, 2.6.4
>
> Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
> YARN-3842.001.patch, YARN-3842.002.patch
>
>
> Consider the following scenario:
> 1. RM assigns a container on node N to an app A.
> 2. Node N is restarted
> 3. A tries to launch container on node N.
> 3 could lead to an NMNotYetReadyException depending on whether NM N has 
> registered with the RM. In MR, this is considered a task attempt failure. A 
> few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3695) ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception.

2016-01-12 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3695:
-
Fix Version/s: 2.6.4
   2.7.3

I committed this to branch-2.7 and branch-2.6.

> ServerProxy (NMProxy, etc.) shouldn't retry forever for non network exception.
> --
>
> Key: YARN-3695
> URL: https://issues.apache.org/jira/browse/YARN-3695
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Raju Bairishetti
> Fix For: 2.8.0, 2.7.3, 2.6.4
>
> Attachments: YARN-3695.01.patch, YARN-3695.patch
>
>
> YARN-3646 fix the retry forever policy in RMProxy that it only applies on 
> limited exceptions rather than all exceptions. Here, we may need the same fix 
> for ServerProxy (NMProxy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4576) Extend blacklist mechanism to protect AM failed multiple times on failure nodes

2016-01-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094097#comment-15094097
 ] 

Junping Du commented on YARN-4576:
--

YARN-2005 sounds like related. However, still checking if all cases our facing 
can be covered by that JIRA work.

> Extend blacklist mechanism to protect AM failed multiple times on failure 
> nodes
> ---
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> Current YARN blacklist mechanism is to track the bad nodes by AM: If AM tried 
> to launch containers on a specific node get failed for several times, AM will 
> blacklist this node in future resource asking. This mechanism works fine for 
> normal containers. However, from our observation on behaviors of several 
> clusters: if this problematic node launch AM failed, then RM could pickup 
> this problematic node to launch next AM attempts again and again that cause 
> application failure in case other functional nodes are busy. In normal case, 
> the customized healthy checker script cannot be so sensitive to mark node as 
> unhealthy when one or two containers get launched failed. However, in RM 
> side, we can blacklist these nodes for launching AM for a certain time if 
> launching AMs get failed before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-12 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094102#comment-15094102
 ] 

Rohith Sharma K S commented on YARN-4371:
-

Looking into discussion, 
# I agree to [~jlowe]'s comment that number of argument check should be made 
very specific to kill. With patch, user can combine with other sub commands. I 
missed this part:-(
# As per Jason Lowe's very fist comment, patch keeps Linux behavior that 
application id's are space separated values rather then comma separated values. 
I think it should be fine.
# About the log message, existing method {{killApplication(applicationId);}} 
prints useful message whether application is exist OR does't exist.  IMO, newly 
added code need not to print this message again. Thoughts?
# With point 3rd assumption, The below code is handled to make sure to keep 
Linux behavior of return code. Catching application is just to make sure to 
continue with other applications.
{code}
catch (ApplicationNotFoundException e) {
// Suppress all ApplicationNotFoundException for now.
   continue;
 }
}
if (reportFailure){throw new ApplicationNotFoundException("Application doesn't 
exist in RM.");}{code}
I.e when killing multiple process and if one process is success then return 
code is zero. 
{code}
root1@root1-ThinkPad-T440p:~/workspace/project_home/hadoop-3.0.0-SNAPSHOT/bin$ 
kill 24187 12345
bash: kill: (12345) - No such process
root1@root1-ThinkPad-T440p:~/workspace/project_home/hadoop-3.0.0-SNAPSHOT/bin$ 
echo $?
0
{code}

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4576) Extend blacklist mechanism to protect AM failed multiple times on failure nodes

2016-01-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094108#comment-15094108
 ] 

Sunil G commented on YARN-4576:
---

Hi [~djp]
After YARN-4284, all container launch errors except PREEMPT container is 
considered for AM blacklisting.

> Extend blacklist mechanism to protect AM failed multiple times on failure 
> nodes
> ---
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> Current YARN blacklist mechanism is to track the bad nodes by AM: If AM tried 
> to launch containers on a specific node get failed for several times, AM will 
> blacklist this node in future resource asking. This mechanism works fine for 
> normal containers. However, from our observation on behaviors of several 
> clusters: if this problematic node launch AM failed, then RM could pickup 
> this problematic node to launch next AM attempts again and again that cause 
> application failure in case other functional nodes are busy. In normal case, 
> the customized healthy checker script cannot be so sensitive to mark node as 
> unhealthy when one or two containers get launched failed. However, in RM 
> side, we can blacklist these nodes for launching AM for a certain time if 
> launching AMs get failed before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4414) Nodemanager connection errors are retried at multiple levels

2016-01-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094148#comment-15094148
 ] 

Hudson commented on YARN-4414:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9094 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9094/])
YARN-4414. Nodemanager connection errors are retried at multiple levels. 
(jlowe: rev 13de8359a1c6d9fc78cd5013c860c1086d86176f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java


> Nodemanager connection errors are retried at multiple levels
> 
>
> Key: YARN-4414
> URL: https://issues.apache.org/jira/browse/YARN-4414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Jason Lowe
>Assignee: Chang Li
> Fix For: 2.7.3, 2.6.4
>
> Attachments: YARN-4414.1.2.patch, YARN-4414.1.2.patch, 
> YARN-4414.1.3.patch, YARN-4414.1.patch, YARN-4414.2.patch, YARN-4414.3.patch
>
>
> This is related to YARN-3238.  Ran into more scenarios where connection 
> errors are being retried at multiple levels, like NoRouteToHostException.  
> The fix for YARN-3238 was too specific, and I think we need a more general 
> solution to catch a wider array of connection errors that can occur to avoid 
> retrying them both at the RPC layer and at the NM proxy layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4576) Extend blacklist mechanism to protect AM failed multiple times on failure nodes

2016-01-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094158#comment-15094158
 ] 

Junping Du commented on YARN-4576:
--

Thanks for pointing it out, [~sunilg]. From briefly looking at YARN-4284, I 
think it could be too strict rules for picking up AMs. The side effects could 
be (I haven't go through the implementation yet): 
1. in a small cluster, all nodes could be blacklisted for AM launching. 
2. in a larger cluster, AM get aggregated on small set of nodes (which don't 
have container failure before) that cause network congestion on these nodes and 
affect apps running.
3. Some problematic apps (malicious or not) launch problematic containers that 
cause many innocent NMs get blacklisted.
I need to go through more details on YARN-4284 for more ideas, but I guess we 
should find another balance for some cases/scenarios.

> Extend blacklist mechanism to protect AM failed multiple times on failure 
> nodes
> ---
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> Current YARN blacklist mechanism is to track the bad nodes by AM: If AM tried 
> to launch containers on a specific node get failed for several times, AM will 
> blacklist this node in future resource asking. This mechanism works fine for 
> normal containers. However, from our observation on behaviors of several 
> clusters: if this problematic node launch AM failed, then RM could pickup 
> this problematic node to launch next AM attempts again and again that cause 
> application failure in case other functional nodes are busy. In normal case, 
> the customized healthy checker script cannot be so sensitive to mark node as 
> unhealthy when one or two containers get launched failed. However, in RM 
> side, we can blacklist these nodes for launching AM for a certain time if 
> launching AMs get failed before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4576) Extend blacklist mechanism to protect AM failed multiple times on failure nodes

2016-01-12 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094177#comment-15094177
 ] 

Sunil G commented on YARN-4576:
---

Yes [~djp]. Currently rules are made strict, mostly the thoughts were to ensure 
container failures to be considered for blacklisting for a safety purpose. I 
agree that its stricter, so a dampening factor or dead zone can be introduced 
to ensure that we do not fall into cases which you have mentioned.

Now we have only {{am.blacklisting.disable-failure-threshold}} which is default 
to 80% from blacklisting all nodes in cluster. +1 for having some more tuning 
configs here.
I feel based on container errors, we can take a call how long we need to black 
list the node. (a time based black list also may  be better)

> Extend blacklist mechanism to protect AM failed multiple times on failure 
> nodes
> ---
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> Current YARN blacklist mechanism is to track the bad nodes by AM: If AM tried 
> to launch containers on a specific node get failed for several times, AM will 
> blacklist this node in future resource asking. This mechanism works fine for 
> normal containers. However, from our observation on behaviors of several 
> clusters: if this problematic node launch AM failed, then RM could pickup 
> this problematic node to launch next AM attempts again and again that cause 
> application failure in case other functional nodes are busy. In normal case, 
> the customized healthy checker script cannot be so sensitive to mark node as 
> unhealthy when one or two containers get launched failed. However, in RM 
> side, we can blacklist these nodes for launching AM for a certain time if 
> launching AMs get failed before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler

2016-01-12 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4492:

Attachment: YARN-4492.v2.001.patch

> Add documentation for queue level preemption which is supported in Capacity 
> scheduler
> -
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler

2016-01-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094226#comment-15094226
 ] 

Naganarasimha G R commented on YARN-4492:
-

Thanks for the review [~eepayne],
 bq. The new section added to CapacityScheduler.md is titled * Queue preemption 
support. Should these (and other preemption-related properties) should be 
documented here.
I felt it was required and i have re done the patch with following modifications
* It was wrongly documented in *Elasticity Feature* that pre-emption is not 
supported, have corrected it.
* Captured all preemption configurations for capacity scheduler
* Created a new heading under configuration and have updated the contents
* Re organised the contents under configurations as Priority configuration was 
wrongly placed in yarn-4098

> Add documentation for queue level preemption which is supported in Capacity 
> scheduler
> -
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler

2016-01-12 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4492:

Attachment: CapacityScheduler.html

Attaching generated html file for reference

> Add documentation for queue level preemption which is supported in Capacity 
> scheduler
> -
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.

2016-01-12 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4552:
-
Attachment: YARN-4552-v2.patch

Thanks [~xgong] for review! I have incorporate your comments in v2 patch with 
adding a unit test (to TestNodeManagerReboot) to verify how patch works. One 
thing to mention is some existing tests in TestNodeManagerReboot are failed 
(even without this patch), but the new added test case could pass. I will file 
a separated JIRA to track existing test failure for TestNodeManagerReboot in 
case no one else filed before.

> NM ResourceLocalizationService should check and initialize local filecache 
> dir (and log dir) even if NM recover is enabled.
> ---
>
> Key: YARN-4552
> URL: https://issues.apache.org/jira/browse/YARN-4552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4552-v2.patch, YARN-4552.patch
>
>
> In some cases, user are cleanup localized file cache for debugging/trouble 
> shooting purpose during NM down time. However, after bring back NM (with 
> recovery enabled), the job submission could be failed for exception like 
> below:
> {noformat}
> Diagnostics: java.io.FileNotFoundException: File 
> /disk/12/yarn/local/filecache does not exist.
> {noformat}
> This is due to we only create filecache dir when recover is not enabled 
> during ResourceLocalizationService get initialized/started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4492) Add documentation for preemption supported in Capacity scheduler

2016-01-12 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4492:

Summary: Add documentation for preemption supported in Capacity scheduler  
(was: Add documentation for queue level preemption which is supported in 
Capacity scheduler)

> Add documentation for preemption supported in Capacity scheduler
> 
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4553) Add cgroups support for docker containers

2016-01-12 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094213#comment-15094213
 ] 

Varun Vasudev commented on YARN-4553:
-

+1 - I'll commit this tomorrow if no one objects.

> Add cgroups support for docker containers
> -
>
> Key: YARN-4553
> URL: https://issues.apache.org/jira/browse/YARN-4553
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-4553.001.patch, YARN-4553.002.patch, 
> YARN-4553.003.patch
>
>
> Currently, cgroups-based resource isolation does not work with docker 
> containers under YARN. The processes in these containers are launched by the 
> docker daemon and they are not children of a container-executor process. 
> Docker supports a --cgroup-parent flag which can be used to point to the 
> container-specific cgroups that are created by the nodemanager. This will 
> allow the Nodemanager to manage cgroups (as it does today) while allowing 
> resource isolation to work with docker containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4492) Add documentation for preemption supported in Capacity scheduler

2016-01-12 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4492:

Description: 
As part of YARN-2056, Support has been added to disable preemption for a 
specific queue. This is a useful feature in a multiload cluster but currently 
missing documentation. 
Complete preemption is not documented hence update all configurations for 
capacity scheduler preemption

  was:As part of YARN-2056, Support has been added to disable preemption for a 
specific queue. This is a useful feature in a multiload cluster but currently 
missing documentation. 


> Add documentation for preemption supported in Capacity scheduler
> 
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 
> Complete preemption is not documented hence update all configurations for 
> capacity scheduler preemption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler

2016-01-12 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4492:

Attachment: (was: CapacityScheduler.html)

> Add documentation for queue level preemption which is supported in Capacity 
> scheduler
> -
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4492.v1.001.patch, YARN-4492.v1.002.patch, 
> YARN-4492.v1.003.patch, YARN-4492.v2.001.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4492) Add documentation for preemption supported in Capacity scheduler

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094272#comment-15094272
 ] 

Hadoop QA commented on YARN-4492:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 1m 11s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12781847/YARN-4492.v2.001.patch
 |
| JIRA Issue | YARN-4492 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 68878194c5bd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 13de835 |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Max memory used | 29MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10243/console |


This message was automatically generated.



> Add documentation for preemption supported in Capacity scheduler
> 
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 
> Complete preemption is not documented hence update all configurations for 
> capacity scheduler preemption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4512) Provide a knob to turn on over-allocation

2016-01-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4512:
---
Attachment: yarn-4512-yarn-1011.003.patch

Updated patch:
# Addresses the inconsistency by changing all configs and methods to 
overallocation
# Fixes some of the reasonable checkstyle issues. 

> Provide a knob to turn on over-allocation
> -
>
> Key: YARN-4512
> URL: https://issues.apache.org/jira/browse/YARN-4512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: YARN-4512-YARN-1011.001.patch, 
> yarn-4512-yarn-1011.002.patch, yarn-4512-yarn-1011.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4512) Provide a knob to turn on over-allocation

2016-01-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4512:
---
Description: We need two configs for overallocation - one to specify the 
threshold upto which it is okay to over-allocate, another to specify the 
threshold after which OPPORTUNISTIC containers should be preempted.

> Provide a knob to turn on over-allocation
> -
>
> Key: YARN-4512
> URL: https://issues.apache.org/jira/browse/YARN-4512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: YARN-4512-YARN-1011.001.patch, 
> yarn-4512-yarn-1011.002.patch, yarn-4512-yarn-1011.003.patch
>
>
> We need two configs for overallocation - one to specify the threshold upto 
> which it is okay to over-allocate, another to specify the threshold after 
> which OPPORTUNISTIC containers should be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4565) When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, Sometimes lead to situation where all queue resources consumed by AMs only

2016-01-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094327#comment-15094327
 ] 

Naganarasimha G R commented on YARN-4565:
-

Hi [~wangda],
Patch is generally fine with very small nits 
* one Checkstyle issue is related to patch(Unused import ) .
* In *TestFairOrderingPolicy*
** {{MockNM nm1 = rm.registerNode("h1:1234", 10 * GB); // label = x}} i think 
assigning to a variable is not required here 
** {{OrderingPolicy policy = lq.getOrderingPolicy();}} we can use generics 
(OrderingPolicy) here to avoid warnings 
** {{Assert.assertTrue(((FairOrderingPolicy)policy).getSizeBasedWeight());}} 
similar to the above comment for FairOrderingPolicy

> When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, 
> Sometimes lead to situation where all queue resources consumed by AMs only
> 
>
> Key: YARN-4565
> URL: https://issues.apache.org/jira/browse/YARN-4565
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0
>Reporter: Karam Singh
>Assignee: Wangda Tan
> Attachments: YARN-4565.1.patch, YARN-4565.2.patch
>
>
> When sizeBasedWeight enabled for FairOrderingPolicy in CapacityScheduler, 
> Sometimes lead to situation where all queue resources consumed by AMs only,
> So from users perpective it appears that all application in queue are stuck, 
> whole queue capacity is comsumed by AMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4552) NM ResourceLocalizationService should check and initialize local filecache dir (and log dir) even if NM recover is enabled.

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094349#comment-15094349
 ] 

Hadoop QA commented on YARN-4552:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 (total was 152, now 154). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 42s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 13s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 58s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12781848/YARN-4552-v2.patch |
| JIRA Issue | YARN-4552 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b2224ad52d79 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven 

[jira] [Updated] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-01-12 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4577:

Attachment: YARN-4577.2.patch

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-12 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4371:
--
Attachment: 0004-YARN-4371.patch

Attaching an updated patch with updated validation.

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch, 0004-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-01-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094397#comment-15094397
 ] 

Xuan Gong commented on YARN-4577:
-

Attached a new patch to fix -1 on findbug.

bq. It seems that we should try to reuse the ApplicationClassLoader for this 
use case instead of creating another variant. Thoughts?

For my understanding, The applicationClassLoader is to append Classes from the 
application JARs in preference to the parent loader. It can not fix our problem 
completely.  

> Enable aux services to have their own custom classpath/jar file
> ---
>
> Key: YARN-4577
> URL: https://issues.apache.org/jira/browse/YARN-4577
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4577.1.patch, YARN-4577.2.patch
>
>
> Right now, users have to add their jars to the NM classpath directly, thus 
> put them on the system classloader. But if multiple versions of the plugin 
> are present on the classpath, there is no control over which version actually 
> gets loaded. Or if there are any conflicts between the dependencies 
> introduced by the auxiliary service and the NM itself, they can break the NM, 
> the auxiliary service, or both.
> The solution could be: to instantiate aux services using a classloader that 
> is different from the system classloader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-12 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4304:
--
Attachment: 0011-YARN-4304.patch

Updating patch as per the comments. [~leftnoteasy] kindly help to check the 
same.

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> 0008-YARN-4304.patch, 0009-YARN-4304.patch, 0010-YARN-4304.patch, 
> 0011-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4585) CapacitySchedulerQueueInfo need to have missing informations from CapacitySchedulerInfo

2016-01-12 Thread Sunil G (JIRA)
Sunil G created YARN-4585:
-

 Summary: CapacitySchedulerQueueInfo need to have missing 
informations from CapacitySchedulerInfo
 Key: YARN-4585
 URL: https://issues.apache.org/jira/browse/YARN-4585
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G


CapacitySchedulerQueueInfo can have information such as 
- capacity for root queue
- CapacitySchedulerHealthInfo
- CapacitySchedulerQueueInfoList (field is present)

This ticket will address to add missing informations to 
CapacitySchedulerQueueInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4512) Provide a knob to turn on over-allocation

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094426#comment-15094426
 ] 

Hadoop QA commented on YARN-4512:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 58s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} Patch generated 8 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 318, now 325). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 52s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 28s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:gre

[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2016-01-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2031:
-
Attachment: YARN-2031-003.patch

sync up with trunk; doesn't address any of the open issues

> YARN Proxy model doesn't support REST APIs in AMs
> -
>
> Key: YARN-2031
> URL: https://issues.apache.org/jira/browse/YARN-2031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: BB2015-05-TBR
> Attachments: YARN-2031-002.patch, YARN-2031-003.patch, 
> YARN-2031.patch.001
>
>
> AMs can't support REST APIs because
> # the AM filter redirects all requests to the proxy with a 302 response (not 
> 307)
> # the proxy doesn't forward PUT/POST/DELETE verbs
> Either the AM filter needs to return 307 and the proxy to forward the verbs, 
> or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4512) Provide a knob to turn on over-allocation

2016-01-12 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094493#comment-15094493
 ] 

Inigo Goiri commented on YARN-4512:
---

003 looks good. I would like [~leftnoteasy] to take a look but I'd say any open 
issues (preemption) should be tackled when we do the policies.

> Provide a knob to turn on over-allocation
> -
>
> Key: YARN-4512
> URL: https://issues.apache.org/jira/browse/YARN-4512
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: YARN-4512-YARN-1011.001.patch, 
> yarn-4512-yarn-1011.002.patch, yarn-4512-yarn-1011.003.patch
>
>
> We need two configs for overallocation - one to specify the threshold upto 
> which it is okay to over-allocate, another to specify the threshold after 
> which OPPORTUNISTIC containers should be preempted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4492) Add documentation for preemption supported in Capacity scheduler

2016-01-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094506#comment-15094506
 ] 

Eric Payne commented on YARN-4492:
--

Thanks for updating the patch so quickly, [~Naganarasimha]. I think it looks 
good overall. I have a couple of minor grammatical nits:
bq. The CapacityScheduler supports preemption of container(s) from the queues 
whose resource usage is more than their guaranteed capacity. Following 
configuration parameters needs to be enabled in yarn-site.xml, for supporting 
preemption of application containers.
- I would say {{container}} rather than {{container(s)}}
- {{_The_ following configuration parameters _need_ to ...}}
- Take out comma after {{yarn-site.xml}}

bq. yarn.resourcemanager.scheduler.monitor.policies
- {{Configured policies _need_ to be ...}}

bq. Following configuration parameters can be configured in yarn-site.xml, to 
control the preemption of containers when ProportionalCapacityPreemptionPolicy 
class is configured for yarn.resourcemanager.scheduler.monitor.policies
- I would remove the comma after {{yarn-site.xml}}

Other than that, it looks good.

> Add documentation for preemption supported in Capacity scheduler
> 
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 
> Complete preemption is not documented hence update all configurations for 
> capacity scheduler preemption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094576#comment-15094576
 ] 

Hadoop QA commented on YARN-4577:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 36s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s 
{color} | {color:red} Patch generated 2 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 261, now 262). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 30s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s {color} 
| {color:red} hadoop-yarn-api in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 0s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {co

[jira] [Commented] (YARN-1815) RM doesn't recover unmanaged AMs into its memory after restart

2016-01-12 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094629#comment-15094629
 ] 

Subru Krishnan commented on YARN-1815:
--

I agree with [~bikassaha] that Unmanaged AM recovery should work exactly like 
managed AMs. Additionally as discussed offline with [~kasha], UAMs are critical 
in the context of Federation (YARN-2915) as we use UAM to transparently scale 
applications across multiple clusters.

[~kasha], are you planning to work on this? If not, I can take it up as I am 
working on Federation.

> RM doesn't recover unmanaged AMs into its memory after restart
> --
>
> Key: YARN-1815
> URL: https://issues.apache.org/jira/browse/YARN-1815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
> yarn-1815-2.patch, yarn-1815-2.patch
>
>
> RM doesn't recover unmanaged AMs into its memory after restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2016-01-12 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4307:

Attachment: YARN-4307.v1.002.patch

Hi [~vvasudev],
I have uploaded a new patch with modifications for your comments
bq. Can we add the AM blacklisted nodes and the nodes blacklisted by the RM in 
to the app attempt info and expose both of these via webservices? Both fields 
should be a part of the rmAppAttempt.
earlier patch was also exposing these info in  app attempt info. In the latest 
patch have taken case for {{RMAppAttempt}}

> Blacklisted nodes for AM container is not getting displayed in the Web UI
> -
>
> Key: YARN-4307
> URL: https://issues.apache.org/jira/browse/YARN-4307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: AppInfoPage.png, RMappAttempt.png, 
> YARN-4307.v1.001.patch, YARN-4307.v1.002.patch, webpage.png, 
> yarn-capacity-scheduler-debug.log
>
>
> In pseudo cluster had 2 NM's  and had launched app with incorrect 
> configuration *./hadoop org.apache.hadoop.mapreduce.SleepJob 
> -Dmapreduce.job.node-label-expression=labelX  
> -Dyarn.app.mapreduce.am.env=JAVA_HOME=/no/jvm/here  -m 5 -mt 1200*.
> First attempt failed and 2nd attempt was launched, but the application was 
> hung. In the scheduler logs found that localhost was blacklisted but in the 
> UI (app& apps listing page) count was shown as zero and as well no hosts 
> listed in the app page. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094837#comment-15094837
 ] 

Hadoop QA commented on YARN-2031:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 31s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy-jdk1.8.0_66
 with JDK v1.8.0_66 generated 2 new issues (was 0, now 2). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 43s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy-jdk1.7.0_91
 with JDK v1.7.0_91 generated 2 new issues (was 0, now 2). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 8s 
{color} | {color:red} Patch generated 8 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
(total was 20, now 25). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 9s 
{color} | {color:red} hadoop-yarn-server-web-proxy in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 23s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy-jdk1.7.0_91
 with JDK v1.7.0_91 generated 2 new issues (was 0, now 2). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed with 
JDK v1.7.0_91. {co

[jira] [Updated] (YARN-4579) Allow container directory permissions to be configurable

2016-01-12 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-4579:
-
Attachment: YARN-4579.002.patch

Fix checkstyle and XML unit test error.

> Allow container directory permissions to be configurable
> 
>
> Key: YARN-4579
> URL: https://issues.apache.org/jira/browse/YARN-4579
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-4579.001.patch, YARN-4579.002.patch
>
>
> By default, container directory permissions are hardcoded to this member in 
> DefaultContainerExecutor:
>   static final short LOGDIR_PERM = (short)0710;
> There are some cases where less restrictive permissions are desired.  Make 
> this configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4492) Add documentation for preemption supported in Capacity scheduler

2016-01-12 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4492:

Attachment: YARN-4492.v2.002.patch

Thanks for reviewing so fast and sharing the comments [~eepayne] :) ,  
attaching updated patch for fixing the review comments

> Add documentation for preemption supported in Capacity scheduler
> 
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch, 
> YARN-4492.v2.002.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 
> Complete preemption is not documented hence update all configurations for 
> capacity scheduler preemption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094849#comment-15094849
 ] 

Naganarasimha G R commented on YARN-4371:
-

bq. About the log message, existing method killApplication(applicationId); 
prints useful message whether application is exist OR does't exist. 
Missed this part...

Overall patch looks fine !

> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch, 
> 0003-YARN-4371.patch, 0004-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4492) Add documentation for preemption supported in Capacity scheduler

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094881#comment-15094881
 ] 

Hadoop QA commented on YARN-4492:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 1m 18s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12781908/YARN-4492.v2.002.patch
 |
| JIRA Issue | YARN-4492 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 528274ef8375 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 126705f |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Max memory used | 29MB |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10251/console |


This message was automatically generated.



> Add documentation for preemption supported in Capacity scheduler
> 
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch, 
> YARN-4492.v2.002.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 
> Complete preemption is not documented hence update all configurations for 
> capacity scheduler preemption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094884#comment-15094884
 ] 

Hadoop QA commented on YARN-4304:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} Patch generated 22 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 261, now 271). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 50s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 3s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 138m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesForCSWithPartitions |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorizat

[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers

2016-01-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094888#comment-15094888
 ] 

Vinod Kumar Vavilapalli commented on YARN-1856:
---

[~vvasudev] / [~kasha], seems like there are a couple of key proposals here, 
let's fork them off to separate tickets so they get the deserved attention. 

> cgroups based memory monitoring for containers
> --
>
> Key: YARN-1856
> URL: https://issues.apache.org/jira/browse/YARN-1856
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Varun Vasudev
> Fix For: 2.9.0
>
> Attachments: YARN-1856.001.patch, YARN-1856.002.patch, 
> YARN-1856.003.patch, YARN-1856.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094874#comment-15094874
 ] 

Naganarasimha G R commented on YARN-4583:
-

Given that AHS(including FileSystemWriter) is already deprecated feature in 
community and planned to be completely removed (YARN-4542), Do we need to work 
further on this issue ?
ATS (Application Timeline Service) is a replacement for it since 2.6.0.  Do you 
have plan to migrate to ATS instead of AHS?


> Resource manager should purge generic history data when using 
> FileSystemApplicationHistoryStore
> ---
>
> Key: YARN-4583
> URL: https://issues.apache.org/jira/browse/YARN-4583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1, 2.7.0, 2.7.1
>Reporter: Johan Gustavsson
> Attachments: YARN-4583.001.patch, YARN-4583.patch
>
>
> In it's current state when enabling 
> `yarn.timeline-service.generic-application-history.enabled` and setting 
> `yarn.timeline-service.generic-application-history.store-class` to 
> `org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
>  files keep building up in dir until it reaches max files for dir. There 
> should be a way to set the RM to purge these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4492) Add documentation for preemption supported in Capacity scheduler

2016-01-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094908#comment-15094908
 ] 

Eric Payne commented on YARN-4492:
--

Thanks [~Naganarasimha].

+1 (non-binding)

> Add documentation for preemption supported in Capacity scheduler
> 
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, YARN-4492.v2.001.patch, 
> YARN-4492.v2.002.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 
> Complete preemption is not documented hence update all configurations for 
> capacity scheduler preemption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2016-01-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094923#comment-15094923
 ] 

Jian He commented on YARN-4497:
---

[~hex108], thanks for working on this. 
 for the patch, I think making below change in RMAppImpl#recover may be enough 
? 
{code}
-for(int i=0; i RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>Priority: Critical
> Attachments: YARN-4497.01.patch
>
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094934#comment-15094934
 ] 

Hadoop QA commented on YARN-4371:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 9s 
{color} | {color:red} Patch generated 3 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client (total was 15, now 17). 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 21s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 25s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 34s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestYarnClient |
|   | org.apache.hadoop.yarn.client.api.impl.TestNMClient |
| JDK v1.7.0_91 Failed junit tests | hadoop.yarn.client.TestGetGroups |
| JDK v1.7.0_91 Timed out junit tests | 
org.apache.hadoop.yarn.client.cli.TestYarnCLI |
|   | org.apache.hadoop.yarn.client.api.impl.Tes

[jira] [Commented] (YARN-4307) Blacklisted nodes for AM container is not getting displayed in the Web UI

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094935#comment-15094935
 ] 

Hadoop QA commented on YARN-4307:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 11s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 11s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 1m 24s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 24s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 437, now 440). {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 8s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 32m 9s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus

[jira] [Commented] (YARN-4579) Allow container directory permissions to be configurable

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094970#comment-15094970
 ] 

Hadoop QA commented on YARN-4579:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 52s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 50s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 50s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 57s 
{color} | {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 57s {color} 
| {color:red} hadoop-yarn in the patch failed with JDK v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 234, now 234). {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 45s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 15s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} uni

[jira] [Updated] (YARN-4579) Allow container directory permissions to be configurable

2016-01-12 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-4579:
-
Attachment: YARN-4579.003.patch

Get patch from correct directory this time.

> Allow container directory permissions to be configurable
> 
>
> Key: YARN-4579
> URL: https://issues.apache.org/jira/browse/YARN-4579
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-4579.001.patch, YARN-4579.002.patch, 
> YARN-4579.003.patch
>
>
> By default, container directory permissions are hardcoded to this member in 
> DefaultContainerExecutor:
>   static final short LOGDIR_PERM = (short)0710;
> There are some cases where less restrictive permissions are desired.  Make 
> this configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3446) FairScheduler HeadRoom calculation should exclude nodes in the blacklist.

2016-01-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095060#comment-15095060
 ] 

Karthik Kambatla commented on YARN-3446:


Patch looks good, but for one minor comment: can we rename 
{{AbstractYarnScheduler#getBlackListNodeIds}} to {{addBlacklisedNodeIdsToList}} 
to capture the behavior here of adding the nodeIds to the list that is passed. 
Also, given the method is used by all schedulers, we might want to add a 
javadoc briefly explaining what it does. 



> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> -
>
> Key: YARN-3446
> URL: https://issues.apache.org/jira/browse/YARN-3446
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3446.000.patch, YARN-3446.001.patch, 
> YARN-3446.002.patch, YARN-3446.003.patch, YARN-3446.004.patch
>
>
> FairScheduler HeadRoom calculation should exclude nodes in the blacklist.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes. This makes jobs to 
> hang forever(ResourceManager does not assign any new containers on 
> blacklisted nodes but availableResource AM get from RM includes blacklisted 
> nodes available resource).
> This issue is similar as YARN-1680 which is for Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-01-12 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-4062:
-
Attachment: YARN-4062-feature-YARN-2928.02.patch


Uploading patch v2. I have made all the changes suggested by Sangjin.

Only change that has not been made yet is the 

bq. can you add this parameter as a real configuration (in 
YarnConfiguration.java/yarn-default.xml)?

The reason is that I am trying to figure out how to access the Yarn 
Configuration variable in the hbase coprocessor. The conf or that specific 
variable/value needs to be passed around somehow. This setting is from the 
source hadoop cluster and is to be used on the sink hbase/hadoop cluster. 


> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.1.patch, 
> YARN-4062-feature-YARN-2928.01.patch, YARN-4062-feature-YARN-2928.02.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095129#comment-15095129
 ] 

Hadoop QA commented on YARN-4062:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
1s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} Patch generated 19 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 (total was 44, now 62). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 12 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 13s 
{color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 37s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-jdk1.7.0_91
 with JDK v1.7.0_91 generated 5 new issues (was 0, now 5). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 3s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 1s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 51s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || R

[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2016-01-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2024:
-
Priority: Major  (was: Critical)

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1815) RM doesn't recover unmanaged AMs into its memory after restart

2016-01-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-1815:
--

Assignee: Subru Krishnan  (was: Karthik Kambatla)

All yours. Thanks for picking this up. 

> RM doesn't recover unmanaged AMs into its memory after restart
> --
>
> Key: YARN-1815
> URL: https://issues.apache.org/jira/browse/YARN-1815
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
> yarn-1815-2.patch, yarn-1815-2.patch
>
>
> RM doesn't recover unmanaged AMs into its memory after restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4526) Make SystemClock singleton so AppSchedulingInfo could use it

2016-01-12 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095162#comment-15095162
 ] 

Arun Suresh commented on YARN-4526:
---

[~kasha], The latest patch looks good. Can you probably kick jenkins once more ?
+1 pending that

> Make SystemClock singleton so AppSchedulingInfo could use it
> 
>
> Key: YARN-4526
> URL: https://issues.apache.org/jira/browse/YARN-4526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4526-1.patch, yarn-4526-2.patch
>
>
> To track the time a request is received, we need to get current system time. 
> For better testability of this, we are likely better off using a Clock 
> instance that uses SystemClock by default. Instead of creating umpteen 
> instances of SystemClock, we should just reuse the same instance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4579) Allow container directory permissions to be configurable

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095171#comment-15095171
 ] 

Hadoop QA commented on YARN-4579:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 50s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 234, now 234). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 56s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 47s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 7s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {c

[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-01-12 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095209#comment-15095209
 ] 

Vrushali C commented on YARN-4062:
--

I see the javadoc and whitespace warnings, I am fixing them. 

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.1.patch, 
> YARN-4062-feature-YARN-2928.01.patch, YARN-4062-feature-YARN-2928.02.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4579) Allow container directory permissions to be configurable

2016-01-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095222#comment-15095222
 ] 

Ray Chiang commented on YARN-4579:
--

RE: checkstyle

File length already exceeded limit.


> Allow container directory permissions to be configurable
> 
>
> Key: YARN-4579
> URL: https://issues.apache.org/jira/browse/YARN-4579
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-4579.001.patch, YARN-4579.002.patch, 
> YARN-4579.003.patch
>
>
> By default, container directory permissions are hardcoded to this member in 
> DefaultContainerExecutor:
>   static final short LOGDIR_PERM = (short)0710;
> There are some cases where less restrictive permissions are desired.  Make 
> this configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2016-01-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095259#comment-15095259
 ] 

Vinod Kumar Vavilapalli commented on YARN-3542:
---

Okay, clearly most of my comments were rooted on the following question
bq. This effectively means the old code is not used anymore, and that the new 
code is stable.

And it seems like we are stating that
bq. [..], there should be no issue hooking into the new handler using the old 
configuration mechanism.?

Given this and the fact that we are internally overriding to use the new 
handlers, is there a reason for keeping the old code at all? Also if we are 
using the new handler code internally anyways, we can proceed with the 
deprecation (or better deletion) of LCEResourcesHandler interface, 
DefaultLCEResourcesHandler etc?

bq. When user sets the old CgroupsLCEResourcesHandler, you are internally 
resetting it to DefaultLCEResourcesHandler(inside LinuxContainerExecutor) and 
using that as a control to stop using the older handler.
Instead of doing this implicitly, how about we completely remove 
LinuxContainerExecutor.resourcesHandler etc as they don't perform any real 
function anymore.

> Re-factor support for CPU as a resource using the new ResourceHandler 
> mechanism
> ---
>
> Key: YARN-3542
> URL: https://issues.apache.org/jira/browse/YARN-3542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-3542.001.patch, YARN-3542.002.patch, 
> YARN-3542.003.patch, YARN-3542.004.patch, YARN-3542.005.patch, 
> YARN-3542.006.patch, YARN-3542.007.patch
>
>
> In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier 
> addition of new resource types in the nodemanager (this was used for network 
> as a resource - See YARN-2140 ). We should refactor the existing CPU 
> implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using 
> the new ResourceHandler mechanism. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3102) Decommisioned Nodes not listed in Web UI

2016-01-12 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-3102:
--
Attachment: YARN-3102-v3.patch

Thank you very much [~templedf] and [~jlowe] for the review. I have updated the 
patch accordingly. There is an additional test case with one node that is in 
the exclude list but does not register. It is accounted for, during RM startup 
as it is present in the list but does not impact metrics during refreshNodes 
since we look at RMNode instances of only active nodes which this host is not. 

> Decommisioned Nodes not listed in Web UI
> 
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager 
>Reporter: Bibin A Chundatt
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, 
> YARN-3102-v3.patch
>
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2016-01-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2031:
-
Attachment: YARN-2031-004.patch

patch -004, address yetus issues.

checkstyle is failing on method length. tough.

> YARN Proxy model doesn't support REST APIs in AMs
> -
>
> Key: YARN-2031
> URL: https://issues.apache.org/jira/browse/YARN-2031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2031-002.patch, YARN-2031-003.patch, 
> YARN-2031-004.patch, YARN-2031.patch.001
>
>
> AMs can't support REST APIs because
> # the AM filter redirects all requests to the proxy with a 302 response (not 
> 307)
> # the proxy doesn't forward PUT/POST/DELETE verbs
> Either the AM filter needs to return 307 and the proxy to forward the verbs, 
> or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3102) Decommisioned Nodes not listed in Web UI

2016-01-12 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-3102:
--
Attachment: YARN-4311-v7.patch

While updating the test for YARN-3102, I modified the {{writeToHostsFile}} to 
have the file object passed as an argument. I am updating this patch with the 
same change to be consistent. No changes to the non-test code.

> Decommisioned Nodes not listed in Web UI
> 
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager 
>Reporter: Bibin A Chundatt
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, 
> YARN-3102-v3.patch, YARN-4311-v7.patch
>
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3102) Decommisioned Nodes not listed in Web UI

2016-01-12 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-3102:
--
Attachment: (was: YARN-4311-v7.patch)

> Decommisioned Nodes not listed in Web UI
> 
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager 
>Reporter: Bibin A Chundatt
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, 
> YARN-3102-v3.patch
>
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2016-01-12 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4311:
--
Attachment: YARN-4311-v7.patch

While updating the test for YARN-3102, I modified the {{writeToHostsFile}} to 
have the file object passed as an argument. I am updating this patch with the 
same change to be consistent. No changes to the non-test code.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, 
> YARN-4311-v6.patch, YARN-4311-v7.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2016-01-12 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095319#comment-15095319
 ] 

Kuhu Shukla commented on YARN-3102:
---

Please ignore, updated the wrong JIRA. 

> Decommisioned Nodes not listed in Web UI
> 
>
> Key: YARN-3102
> URL: https://issues.apache.org/jira/browse/YARN-3102
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
> Environment: 2 Node Manager and 1 Resource Manager 
>Reporter: Bibin A Chundatt
>Assignee: Kuhu Shukla
>Priority: Minor
> Attachments: YARN-3102-v1.patch, YARN-3102-v2.patch, 
> YARN-3102-v3.patch
>
>
> Configure yarn.resourcemanager.nodes.exclude-path in yarn-site.xml to 
> yarn.exlude file In RM1 machine
> Add Yarn.exclude with NM1 Host Name 
> Start the node as listed below NM1,NM2 Resource manager
> Now check Nodes decommisioned in /cluster/nodes
> Number of decommisioned node is listed as 1 but Table is empty in 
> /cluster/nodes/decommissioned (detail of Decommision node not shown)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095337#comment-15095337
 ] 

Hadoop QA commented on YARN-2031:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 30s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy-jdk1.8.0_66
 with JDK v1.8.0_66 generated 2 new issues (was 0, now 2). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 43s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy-jdk1.7.0_91
 with JDK v1.7.0_91 generated 2 new issues (was 0, now 2). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 8s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
(total was 20, now 22). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 57s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy-jdk1.8.0_66
 with JDK v1.8.0_66 generated 8 new issues (was 25, now 26). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+

[jira] [Commented] (YARN-4583) Resource manager should purge generic history data when using FileSystemApplicationHistoryStore

2016-01-12 Thread Johan Gustavsson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095342#comment-15095342
 ] 

Johan Gustavsson commented on YARN-4583:


If FileSystemWriter is deprecated, you can go ahead and void this. I originally 
wrote this patch for 2.4.1 then noticed there were no similar function in trunk 
so I ported it and submitted it. I am planning on using ATS ones I upgrade to 
2.7.* but haven't had time to look into the setup yet.

> Resource manager should purge generic history data when using 
> FileSystemApplicationHistoryStore
> ---
>
> Key: YARN-4583
> URL: https://issues.apache.org/jira/browse/YARN-4583
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.4.1, 2.7.0, 2.7.1
>Reporter: Johan Gustavsson
> Attachments: YARN-4583.001.patch, YARN-4583.patch
>
>
> In it's current state when enabling 
> `yarn.timeline-service.generic-application-history.enabled` and setting 
> `yarn.timeline-service.generic-application-history.store-class` to 
> `org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore`
>  files keep building up in dir until it reaches max files for dir. There 
> should be a way to set the RM to purge these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4581) thread leak makes RM crash while RM is recovering

2016-01-12 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095369#comment-15095369
 ] 

sandflee commented on YARN-4581:


thanks [~Naganarasimha] [~djp], our cluster is based on 2.4.1, and will use ATS 
util we update cluster.

> thread leak makes RM crash while RM is recovering
> -
>
> Key: YARN-4581
> URL: https://issues.apache.org/jira/browse/YARN-4581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4581.01.patch
>
>
> we enable ApplicationHistoryWriter, and find thousands of  Errors:
> {quote}
> 2016-01-08 03:13:03,441 ERROR 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
>  Error when openning history file of application 
> application_1451878591907_0197
> java.io.IOException: Output file not at zero offset.
> at 
> org.apache.hadoop.io.file.tfile.BCFile$Writer.(BCFile.java:288)
> at org.apache.hadoop.io.file.tfile.TFile$Writer.(TFile.java:288)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.(FileSystemApplicationHistoryStore.java:728)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> and this leads rm crashed:
> {quote}
> 2016-01-08 03:13:08,335 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2033)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1652)
> at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1573)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1603)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1591)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:324)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:324)
> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.(FileSystemApplicationHistoryStore.java:723)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> after serveval failover, rm finish recovering, thousands of hdfs client 
> thread are leaked in rm.
> {quote}
> "Thread-22723" #22893 daemon prio=5 os_prio=0 tid=0x7f75f0346000 
> nid=0x132e in Object.wait() [0x7f74ea7ca000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at 
> org.apa

[jira] [Updated] (YARN-4526) Make SystemClock singleton so AppSchedulingInfo could use it

2016-01-12 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4526:
---
Attachment: yarn-4526-2.patch

Uploading the same patch again to kick Jenkins. 

> Make SystemClock singleton so AppSchedulingInfo could use it
> 
>
> Key: YARN-4526
> URL: https://issues.apache.org/jira/browse/YARN-4526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4526-1.patch, yarn-4526-2.patch, yarn-4526-2.patch
>
>
> To track the time a request is received, we need to get current system time. 
> For better testability of this, we are likely better off using a Clock 
> instance that uses SystemClock by default. Instead of creating umpteen 
> instances of SystemClock, we should just reuse the same instance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4502) Sometimes Two AM containers get launched

2016-01-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-4502:
-

Assignee: Vinod Kumar Vavilapalli  (was: Wangda Tan)

bq. If add-application-attempt-event sent to scheduler before 
container-rescheduled-event arrives, application attempt will be replaced so 
resource request will be restored to next attempt.
Seemed like a wild theory on first look, but it isn't and you are right! 
Because the current flow is {{ContainerRescheduledTransition -> RM level event 
handler + queue -> Scheduler event handler + queue}}, it is actually very 
likely for this to happen.

Once we remove this flow and let scheduler do the {{kill-container + 
recover-requests}} in one shot, none of the routing-to-the wrong attempt will 
happen anymore.

Let me take a crack at this, assigning this to myself.

Side issue discovered: While the app-attempt finish is in the process of saving 
to state-store, the scheduler can happily go around allocating more and more 
containers for the (finishing) app-attempt!

> Sometimes Two AM containers get launched
> 
>
> Key: YARN-4502
> URL: https://issues.apache.org/jira/browse/YARN-4502
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
>
> Scenario : 
> * set yarn.resourcemanager.am.max-attempts = 2
> * start dshell application
> {code}
>  yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> hadoop-yarn-applications-distributedshell-*.jar 
> -attempt_failures_validity_interval 6 -shell_command "sleep 150" 
> -num_containers 16
> {code}
> * Kill AM pid
> * Print container list for 2nd attempt
> {code}
> yarn container -list appattempt_1450825622869_0001_02
> INFO impl.TimelineClientImpl: Timeline service address: 
> http://xxx:port/ws/v1/timeline/
> INFO client.RMProxy: Connecting to ResourceManager at xxx/10.10.10.10:
> Total number of containers :2
> Container-Id Start Time Finish Time   
> StateHost   Node Http Address 
>LOG-URL
> container_e12_1450825622869_0001_02_02 Tue Dec 22 23:07:35 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_02/hrt_qa
> container_e12_1450825622869_0001_02_01 Tue Dec 22 23:07:34 + 2015 
>   N/A RUNNINGxxx:25454   http://xxx:8042 
> http://xxx:8042/node/containerlogs/container_e12_1450825622869_0001_02_01/hrt_qa
> {code}
> * look for new AM pid 
> Here, 2nd AM container was suppose to be started on  
> container_e12_1450825622869_0001_02_01. But AM was not launched on 
> container_e12_1450825622869_0001_02_01. It was in AQUIRED state. 
> On other hand, container_e12_1450825622869_0001_02_02 got the AM running. 
> Expected behavior: RM should not start 2 containers for starting AM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2016-01-12 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095389#comment-15095389
 ] 

Jun Gong commented on YARN-4497:


[~jianhe] Thanks for review and comments.

{quote}
for the patch, I think making below change in RMAppImpl#recover may be enough ?
{quote}
There might be some problems:
1. *appState.attempts.keySet()* is not sorted by attempt ID, however we need 
recover them by order because we use *currentAttempt* to get AMBlacklist and we 
calle *getNumFailedAppAttempts()* in *createNewAttempt()* .
2. We need update *nextAttemptId* after recovering attempts.
3. We need to deal with the case 2 in previous comment: attempt's final state 
is missed(fail to store its final state), otherwise it will cause RM to 
relaunch this attempt: it will be in *LAUNCEHD* state after recover, and will 
time out(the attempt has already failed), then RM will relaunch it.

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>Priority: Critical
> Attachments: YARN-4497.01.patch
>
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4584) RM start failure when AM is preempted many times

2016-01-12 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095395#comment-15095395
 ] 

Jun Gong commented on YARN-4584:


[~bibinchundatt] If attempt 1~28 are removed and attempt 29~31 has been saved 
to appstore successfully, there will be no NPE for RM recovery. Could you share 
more RM log? Thanks.

> RM start failure when AM is preempted many times
> 
>
> Key: YARN-4584
> URL: https://issues.apache.org/jira/browse/YARN-4584
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> Due resource limit in queue AM got prempted about 20 times 
> On RM restart RM fails to restart
> {noformat}
> 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: 
> noteFailure java.lang.NullPointerException
> 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state STARTED; cause: 
> java.lang.NullPointerException
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.AbstractService: 
> Service: RMActiveServices entered state STOPPED
> 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service.CompositeService: 
> RMActiveServices: stopping services, size=16
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4581) thread leak makes RM crash while RM is recovering

2016-01-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095469#comment-15095469
 ] 

Vinod Kumar Vavilapalli commented on YARN-4581:
---

[~sandflee] / [~djp] / [~Naganarasimha] Given that the patch is 
straightforward, shall we just get it in for folks on older versions? I don't 
see any downside to including the patch.

> thread leak makes RM crash while RM is recovering
> -
>
> Key: YARN-4581
> URL: https://issues.apache.org/jira/browse/YARN-4581
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4581.01.patch
>
>
> we enable ApplicationHistoryWriter, and find thousands of  Errors:
> {quote}
> 2016-01-08 03:13:03,441 ERROR 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
>  Error when openning history file of application 
> application_1451878591907_0197
> java.io.IOException: Output file not at zero offset.
> at 
> org.apache.hadoop.io.file.tfile.BCFile$Writer.(BCFile.java:288)
> at org.apache.hadoop.io.file.tfile.TFile$Writer.(TFile.java:288)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.(FileSystemApplicationHistoryStore.java:728)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> and this leads rm crashed:
> {quote}
> 2016-01-08 03:13:08,335 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2033)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1652)
> at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1573)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1603)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1591)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:324)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:324)
> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileWriter.(FileSystemApplicationHistoryStore.java:723)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.applicationStarted(FileSystemApplicationHistoryStore.java:418)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter.handleWritingApplicationHistoryEvent(RMApplicationHistoryWriter.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:297)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter$ForwardingEventHandler.handle(RMApplicationHistoryWriter.java:292)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> after serveval failover, rm finish recovering, thousands of hdfs client 
> thread are leaked in rm.
> {quote}
> "Thread-22723" #22893 daemon prio=5 os_prio=0 tid=0x7f75f0346000 
> nid=0x132e in Object.wait() [0x7f74ea7ca000]
>java.lang.Thread.Stat

[jira] [Commented] (YARN-4579) Allow container directory permissions to be configurable

2016-01-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095475#comment-15095475
 ] 

Vinod Kumar Vavilapalli commented on YARN-4579:
---

bq. There are some cases where less restrictive permissions are desired. Make 
this configurable.
Can you elaborate on this?

If the use-case has merits, this shouldn't be limited only to 
DefaultContainerExecutor? And depending on what it is, it may (or may not) 
apply to other dirs like local-dirs too.

> Allow container directory permissions to be configurable
> 
>
> Key: YARN-4579
> URL: https://issues.apache.org/jira/browse/YARN-4579
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-4579.001.patch, YARN-4579.002.patch, 
> YARN-4579.003.patch
>
>
> By default, container directory permissions are hardcoded to this member in 
> DefaultContainerExecutor:
>   static final short LOGDIR_PERM = (short)0710;
> There are some cases where less restrictive permissions are desired.  Make 
> this configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3102) Decommisioned Nodes not listed in Web UI

2016-01-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095490#comment-15095490
 ] 

Hadoop QA commented on YARN-3102:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 66, now 70). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 16s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 22s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 22s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12781953/YARN-3102-v3.pa

  1   2   >