date:20141111

[jira] [Updated] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-11 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2856:
-
Attachment: YARN-2856.patch

> Application recovery throw InvalidStateTransitonException: Invalid event: 
> ATTEMPT_KILLED at ACCEPTED
> 
>
> Key: YARN-2856
> URL: https://issues.apache.org/jira/browse/YARN-2856
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
> Attachments: YARN-2856.patch
>
>
> It is observed that recovering an application with its attempt KILLED final 
> state throw below exception. And application remain in accepted state forever.
> {code}
> 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
> handle this event at current state | 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_KILLED at ACCEPTED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207775#comment-14207775
 ] 

Hadoop QA commented on YARN-2236:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12681014/YARN-2236-trunk-v6.patch
  against trunk revision 53f64ee.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5823//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5823//console

This message is automatically generated.

> Shared Cache uploader service on the Node Manager
> -
>
> Key: YARN-2236
> URL: https://issues.apache.org/jira/browse/YARN-2236
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
> YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch, 
> YARN-2236-trunk-v6.patch
>
>
> Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-11-11 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207748#comment-14207748
 ] 

Sangjin Lee commented on YARN-2236:
---

Karthik, the v.6 patch should address all of your comments except #8.

As for #8, it is true that the event handler is bit extraneous. But from the 
code standpoint, it is pretty clean and elegant. We just initialize the 
SharedCacheUploadService, and ContainerImpl can simply publish the event when 
needed. It also makes the coupling between SharedCacheUploadService and 
ContainerImpl loose.

It is possible to have ContainerImpl use SharedCacheUploadService directly, but 
then the SharedCacheUploadService needs to be passed into the ContainerImpl 
constructor so it can be invoked directly. So all in all, I feel that the 
current approach is as clean as the alternative, if not cleaner. Let me know 
your thoughts. Thanks!

> Shared Cache uploader service on the Node Manager
> -
>
> Key: YARN-2236
> URL: https://issues.apache.org/jira/browse/YARN-2236
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
> YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch, 
> YARN-2236-trunk-v6.patch
>
>
> Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-11 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207744#comment-14207744
 ] 

Rohith commented on YARN-2856:
--

It is possible event ATTEMPT_KILLED can come to RMApp while recovering the 
attempt with KILLED state. This event need to be handled.

> Application recovery throw InvalidStateTransitonException: Invalid event: 
> ATTEMPT_KILLED at ACCEPTED
> 
>
> Key: YARN-2856
> URL: https://issues.apache.org/jira/browse/YARN-2856
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>
> It is observed that recovering an application with its attempt KILLED final 
> state throw below exception. And application remain in accepted state forever.
> {code}
> 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
> handle this event at current state | 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_KILLED at ACCEPTED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-11-11 Thread Sangjin Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2236:
--
Attachment: YARN-2236-trunk-v6.patch

v.6 patch posted.

Again, to see the diff against the trunk, see 
https://github.com/ctrezzo/hadoop/compare/trunk...sharedcache-5-YARN-2236-uploader

To see the diff between v.5 and v.6, see 
https://github.com/ctrezzo/hadoop/commit/a74f38cf3e3de824b3c6ced327acbe8e3937aef0

> Shared Cache uploader service on the Node Manager
> -
>
> Key: YARN-2236
> URL: https://issues.apache.org/jira/browse/YARN-2236
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
> YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch, 
> YARN-2236-trunk-v6.patch
>
>
> Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-11 Thread Rohith (JIRA)

Rohith created YARN-2856:


 Summary: Application recovery throw 
InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED
 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith


It is observed that recovering an application with its attempt KILLED final 
state throw below exception. And application remain in accepted state forever.
{code}
2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't handle 
this event at current state | 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
ATTEMPT_KILLED at ACCEPTED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Ravi Prakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash reassigned YARN-1964:
--

Assignee: Ravi Prakash  (was: Abin Shahab)

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Ravi Prakash
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207709#comment-14207709
 ] 

Ravi Prakash commented on YARN-1964:


I've committed this to trunk and branch-2. I wasn't sure about whether to put 
the release notes under release 2.6 or 2.7, but on a leap of faith, I've put it 
under 2.6 right now. I'll fix if it Arun declines to respin an RC.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207706#comment-14207706
 ] 

Hudson commented on YARN-1964:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6517 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6517/])
YARN-1964. Create Docker analog of the LinuxContainerExecutor in YARN 
(raviprak: rev 53f64ee516d03f6ec87b41d77c214aa2fe4fa0ed)
* hadoop-project/src/site/site.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/DockerContainerExecutor.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java


> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207659#comment-14207659
 ] 

Hadoop QA commented on YARN-2846:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680993/YARN-2846.patch
  against trunk revision 46f6f9d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5822//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5822//console

This message is automatically generated.

> Incorrect persist exit code for running containers in reacquireContainer() 
> that interrupted by NodeManager restart.
> ---
>
> Key: YARN-2846
> URL: https://issues.apache.org/jira/browse/YARN-2846
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2846-demo.patch, YARN-2846.patch
>
>
> The NM restart work preserving feature could make running AM container get 
> LOST and killed during stop NM daemon. The exception is like below:
> {code}
> 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
> container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
> physical memory used; 931.3 MB of 1.0 GB virtual memory used
> 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
> 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
> Applications still running : [application_1415666714233_0001]
> 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 45454
> 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
> IPC Server listener on 45454
> 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(141)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
> IPC Server Responder
> 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
> aggregation for application_1415666714233_0001
> 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
> application application_1415666714233_0001
> 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(476)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
> (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
> container_1415666714233_0001_01_01
> java.io.IOException: Interrupted while waiting for process 20001 to exit
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)

[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207639#comment-14207639
 ] 

Junping Du commented on YARN-2846:
--

Thanks [~jlowe] for review and comments. The latest patch addressed your 
comments. 
bq. I'm curious why we're not seeing a similar issue with regular 
ContainerLaunch threads, as they should be interrupted as well. Are those 
threads silently swallowing the interrupt? Because otherwise I would expect us 
to log a container completion just like we were doing with a recovered 
container.
I am not sure on this. But if regular ContainerLaunch get interrupted, we may 
not care if running container exit code as these running container should be 
killed soon (because NM daemon stop). Am I missing anything here?

> Incorrect persist exit code for running containers in reacquireContainer() 
> that interrupted by NodeManager restart.
> ---
>
> Key: YARN-2846
> URL: https://issues.apache.org/jira/browse/YARN-2846
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2846-demo.patch, YARN-2846.patch
>
>
> The NM restart work preserving feature could make running AM container get 
> LOST and killed during stop NM daemon. The exception is like below:
> {code}
> 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
> container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
> physical memory used; 931.3 MB of 1.0 GB virtual memory used
> 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
> 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
> Applications still running : [application_1415666714233_0001]
> 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 45454
> 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
> IPC Server listener on 45454
> 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(141)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
> IPC Server Responder
> 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
> aggregation for application_1415666714233_0001
> 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
> application application_1415666714233_0001
> 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(476)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
> (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
> container_1415666714233_0001_01_01
> java.io.IOException: Interrupted while waiting for process 20001 to exit
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
> ... 6 more
> {code}
> In reacquireContainer() of ContainerExecutor.java, the while loop of checking 
> container process (AM container) will be interrupted by NM stop. The 
> IOException get thrown and failed to

[jira] [Updated] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2846:
-
Attachment: YARN-2846.patch

> Incorrect persist exit code for running containers in reacquireContainer() 
> that interrupted by NodeManager restart.
> ---
>
> Key: YARN-2846
> URL: https://issues.apache.org/jira/browse/YARN-2846
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2846-demo.patch, YARN-2846.patch
>
>
> The NM restart work preserving feature could make running AM container get 
> LOST and killed during stop NM daemon. The exception is like below:
> {code}
> 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
> container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
> physical memory used; 931.3 MB of 1.0 GB virtual memory used
> 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
> 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
> Applications still running : [application_1415666714233_0001]
> 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 45454
> 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
> IPC Server listener on 45454
> 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(141)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
> IPC Server Responder
> 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
> aggregation for application_1415666714233_0001
> 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
> application application_1415666714233_0001
> 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(476)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
> (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
> container_1415666714233_0001_01_01
> java.io.IOException: Interrupted while waiting for process 20001 to exit
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
> ... 6 more
> {code}
> In reacquireContainer() of ContainerExecutor.java, the while loop of checking 
> container process (AM container) will be interrupted by NM stop. The 
> IOException get thrown and failed to generate an ExitCodeFile for the running 
> container. Later, the IOException will be caught in upper call 
> (RecoveredContainerLaunch.call()) and the ExitCode (by default to be LOST 
> without any setting) get persistent in NMStateStore. 
> After NM restart again, this container is recovered as COMPLETE state but 
> exit code is LOST (154) - cause this (AM) container get killed later.
> We should get rid of recording the exit code of running containers if 
> detecting process is interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2855) Wish yarn web app use local date format to show app date time

2014-11-11 Thread Li Junjun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Junjun updated YARN-2855:

Fix Version/s: 2.7.0

> Wish yarn web app use local date format to show app date time 
> --
>
> Key: YARN-2855
> URL: https://issues.apache.org/jira/browse/YARN-2855
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Li Junjun
>Priority: Minor
> Fix For: 2.7.0
>
>
> in yarn.dt.plugins.js  
> function renderHadoopDate use toUTCString . 
> I'm in China,  so  I need to add 8 hours in my mind every time!
> I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2855) Wish yarn web app use local date format to show app date time

2014-11-11 Thread Li Junjun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207624#comment-14207624
 ] 

Li Junjun commented on YARN-2855:
-

yes! I closed it !

> Wish yarn web app use local date format to show app date time 
> --
>
> Key: YARN-2855
> URL: https://issues.apache.org/jira/browse/YARN-2855
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Li Junjun
>Priority: Minor
> Fix For: 2.7.0
>
>
> in yarn.dt.plugins.js  
> function renderHadoopDate use toUTCString . 
> I'm in China,  so  I need to add 8 hours in my mind every time!
> I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2855) Wish yarn web app use local date format to show app date time

2014-11-11 Thread Li Junjun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Junjun resolved YARN-2855.
-
Resolution: Duplicate

> Wish yarn web app use local date format to show app date time 
> --
>
> Key: YARN-2855
> URL: https://issues.apache.org/jira/browse/YARN-2855
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Li Junjun
>Priority: Minor
> Fix For: 2.7.0
>
>
> in yarn.dt.plugins.js  
> function renderHadoopDate use toUTCString . 
> I'm in China,  so  I need to add 8 hours in my mind every time!
> I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2855) Wish yarn web app use local date format to show app date time

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207610#comment-14207610
 ] 

Karthik Kambatla commented on YARN-2855:


Duplicate of YARN-570?

> Wish yarn web app use local date format to show app date time 
> --
>
> Key: YARN-2855
> URL: https://issues.apache.org/jira/browse/YARN-2855
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Li Junjun
>Priority: Minor
>
> in yarn.dt.plugins.js  
> function renderHadoopDate use toUTCString . 
> I'm in China,  so  I need to add 8 hours in my mind every time!
> I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207562#comment-14207562
 ] 

Naganarasimha G R commented on YARN-2838:
-

Hi [~zjshen]
  I will go through it (YARN-2033), but felt like some issues still stand 
valid even if plan to continue as timeline server itself.
{quote}
# Whatever the CLI command user executes is historyserver or timelineserver it 
looks 
like ApplicationHistoryServer only run. So can we modify the name of the class 
ApplicationHistoryServer to TimelineHistoryServer (or any other suitable name 
as 
it seems like any command user runs ApplicationHistoryServer is started)
# Instead of the "Starting the History Server anyway..." deprecated msg, can we 
have 
"Starting the Timeline History Server anyway...".
# Based on start or stop, deprecated message should get modified to "Starting 
the 
Timeline History Server anyway..." or "Stopping the Timeline History Server 
anyway..."
{quote}
So if you comment on the individual issues/points would like to start fixing as 
part of this jira

There is also a 4th issue which i mentioned 
{quote}
Missed to add point 4 : In YARNClientIMPL;history data can be either got from 
HistoryServer (old manager) or from TimeLineServer (new)
So historyServiceEnabled flag needs to check for both Timeline server 
configurations and ApplicationHistoryServer configurations, as data can be got 
from either of them.
{quote}
I think this is also related to the issue which you mentioned ??We still didn't 
integrate TimelineClient and AHSClient, the latter of which is RPC interface of 
getting generic history information via RPC interface.??. But any way we need 
to fix this issue also right ? so already any jira is raised or shall i work as 
part of this jira ? 

And also please inform if this issue needs to be split into mulitple jiras 
(apart from documentation which you have already raised) would like to split 
and work on them.
As already i have started looking into these issues, was also planning to work 
on documentation. If you don't mind can you assign the issue (YARN-2854) to me ?

> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2855) Wish yarn web app use local date format to show app date time

2014-11-11 Thread Li Junjun (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Junjun updated YARN-2855:

Summary: Wish yarn web app use local date format to show app date time   
(was: Use local date format to show app date time ,)

> Wish yarn web app use local date format to show app date time 
> --
>
> Key: YARN-2855
> URL: https://issues.apache.org/jira/browse/YARN-2855
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Li Junjun
>Priority: Minor
>
> in yarn.dt.plugins.js  
> function renderHadoopDate use toUTCString . 
> I'm in China,  so  I need to add 8 hours in my mind every time!
> I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2855) Use local date format to show app date time ,

2014-11-11 Thread Li Junjun (JIRA)

Li Junjun created YARN-2855:
---

 Summary: Use local date format to show app date time ,
 Key: YARN-2855
 URL: https://issues.apache.org/jira/browse/YARN-2855
 Project: Hadoop YARN
  Issue Type: Wish
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Li Junjun
Priority: Minor


in yarn.dt.plugins.js  
function renderHadoopDate use toUTCString . 
I'm in China,  so  I need to add 8 hours in my mind every time!

I wish use toLocaleString() to format Date instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-11 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2848:
--
Description: Likely solutions to [YARN-1680] (properly handling node and 
rack blacklisting with cluster level node additions and removals) will entail 
managing an application-level "slice" of the cluster resource available to the 
application for use in accurately calculating the application headroom and user 
limit.  There is an assumption that events which impact this resource will 
occur less frequently than the need to calculate headroom, userlimit, etc 
(which is a valid assumption given that occurs per-allocation heartbeat).  
Given that, the application should (with assistance from cluster-level code...) 
detect changes to the composition of the cluster (node addition, removal) and 
when those have occurred, calculate an application specific cluster resource by 
comparing cluster nodes to it's own blacklist (both rack and individual node).  
I think it makes sense to include nodelabel considerations into this 
calculation as it will be efficient to do both at the same time and the single 
resource value reflecting both constraints could then be used for efficient 
frequent headroom and userlimit calculations while remaining highly accurate.  
The application would need to be made aware of nodelabel changes it is 
interested in (the application or removal of labels of interest to the 
application to/from nodes).  For this purpose, the application submissions's 
nodelabel expression would be used to determine the nodelabel impact on the 
resource used to calculate userlimit and headroom (Cases where the application 
elected to request resources not using the application level label expression 
are out of scope for this - but for the common usecase of an application which 
uses a particular expression throughout, userlimit and headroom would be 
accurate) This could also provide an overall mechanism for handling 
application-specific resource constraints which might be added in the future.  
(was: Likely solutions to [YARN-1680] (properly handling node and rack 
blacklisting with cluster level node additions and removals) will entail 
managing an application-level "slice" of the cluster resource available to the 
application for use in accurately calculating the application headroom and user 
limit.  There is an assumption that events which impact this resource will 
change less frequently than the need to calculate headroom, userlimit, etc 
(which is a valid assumption given that occurs per-allocation heartbeat).  
Given that, the application should (with assistance from cluster-level code...) 
detect changes to the composition of the cluster (node addition, removal) and 
when those have occurred, calculate a application specific cluster resource by 
comparing cluster nodes to it's own blacklist (both rack and individual node).  
I think it makes sense to include nodelabel considerations into this 
calculation as it will be efficient to do both at the same time and the single 
resource value reflecting both constraints could then be used for efficient 
frequent headroom and userlimit calculations while remaining highly accurate.  
The application would need to be made aware of nodelabel changes it is 
interested in (the application or removal of labels of interest to the 
application to/from nodes).  For this purpose, the application submissions's 
nodelabel expression would be used to determine the nodelabel impact on the 
resource used to calculate userlimit and headroom (Cases where application 
elected to request resources not using the application level label expression 
are out of scope for this - but for the common usecase of an application which 
uses a particular expression throughout, userlimit and headroom would be 
accurate) This could also provide an overall mechanism for handling 
application-specific resource constraints which might be added in the future.)

> (FICA) Applications should maintain an application specific 'cluster' 
> resource to calculate headroom and userlimit
> --
>
> Key: YARN-2848
> URL: https://issues.apache.org/jira/browse/YARN-2848
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
>
> Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
> with cluster level node additions and removals) will entail managing an 
> application-level "slice" of the cluster resource available to the 
> application for use in accurately calculating the application headroom and 
> user limit.  There is an assumption that events which impact this resource 
> will occur less frequently than the need to c

[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207486#comment-14207486
 ] 

Hadoop QA commented on YARN-2853:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680948/YARN-2853.1.patch
  against trunk revision 163bb55.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5821//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5821//console

This message is automatically generated.

> Killing app may hang while AM is unregistering
> --
>
> Key: YARN-2853
> URL: https://issues.apache.org/jira/browse/YARN-2853
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2853.1.patch, YARN-2853.1.patch
>
>
> When killing an app, app first moves to KILLING state, If RMAppAttempt 
> receives the attempt_unregister event before attempt_kill event,  it'll 
> ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
> KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-11-11 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207485#comment-14207485
 ] 

Sangjin Lee commented on YARN-2236:
---

Thanks Karthik! Let me review them, and see what I can do.

Just a quick question, in 2, did you mean marking the entire class BuilderUtils 
as Private or only the methods that are touched by this JIRA?

> Shared Cache uploader service on the Node Manager
> -
>
> Key: YARN-2236
> URL: https://issues.apache.org/jira/browse/YARN-2236
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
> YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch
>
>
> Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207464#comment-14207464
 ] 

Karthik Kambatla commented on YARN-2236:


Sorry for the delay on this, Sangjin. Patch looks generally good, but for some 
minor comments: 
# LocalResource - mark the methods Public-Unstable for now, we can mark them 
Public-Stable once the feature is complete.
# Unrelated to this patch, can me mark BuilderUtils @Private for clarity. 
# Also, mark FSDownload#isPublic @Private
# Rename ContainerImpl#storeSharedCacheUploadPolicies to 
storeSharedCacheUploadPolicy? Also, should use block comments instead of line 
comments. 
# LocalResourceRequest - LOG is unused, we should probably get rid of it along 
with its imports.
# SharedCacheChecksumFactory
## In the map, can we use Class instead of String? 
## getCheckSum should use conf.getClass for getting the classname, and 
ReflectionUtils.newInstance for instantiation to go with rest of the YARN code. 
Refer to RMProxy for further information. 
# Nit: SharedCacheUploader#call - remove the TODOs
# Instead of creating an event and submitting through the event-handler, would 
it be simpler to synchronously submit it since we are queueing it up to the 
executor anyway? 

> Shared Cache uploader service on the Node Manager
> -
>
> Key: YARN-2236
> URL: https://issues.apache.org/jira/browse/YARN-2236
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch, 
> YARN-2236-trunk-v3.patch, YARN-2236-trunk-v4.patch, YARN-2236-trunk-v5.patch
>
>
> Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206671#comment-14206671
 ] 

Zhijie Shen edited comment on YARN-2838 at 11/12/14 12:44 AM:
--

[~Naganarasimha], sorry for not responding you immediately as being busy on 
finalizing 2.6. A quick scan through your issue document. Here's my 
clarification:

1. While the entry point of the this sub-module is still called 
ApplicationHistoryServer, it is actually generalized to be TimelineServer right 
now (definitely we need to refactor the code at some point). The baseline 
service provided the the timeline server is to allow the cluster and its apps 
to store their history information, metrics and so on by complying with the 
defined timeline data model. Later on, users and admins can query this 
information to do the analysis.

2. Application history (or we prefer to call it generic history service) is now 
a built-in service in the timeline server to record the generic history 
information of YARN apps. It was on a separate store (on FS), but after 
YARN-2033, it has been moved to the timeline store too, as a payload. We 
replace the old storage layer, but keep the existing interfaces (web UI, 
services, CLI) not changed to be the analog of what RM provides for running 
apps. We still didn't integrate TimelineClient and AHSClient, the latter of 
which is RPC interface of getting generic history information via RPC 
interface. APPLICATION_HISTORY_ENABLED is the only remaining old config to 
control whether we also want to pull the app info from the generic history 
service inside the timeline server. You may want to take a look at YARN-2033 to 
get more context about the change. Moreover, as a number of limitation of the 
old history store, we're no longer going to support it.

3. The document is definitely staled. I'll file separate document Jira, 
however, it's too late for 2.6. Let's target 2.7 for an up-to-date document 
about timeline service and its built-in generic history service (YARN-2854). 
Does it sound good?


was (Author: zjshen):
[~Naganarasimha], sorry for not responding you immediately as being busy on 
finalizing 2.6. A quick scan through your issue document. Here's my 
clarification:

1. While the entry point of the this sub-module is still called 
ApplicationHistoryServer, it is actually generalized to be TimelineServer right 
now (definitely we need to refactor the code at some point). The baseline 
service provided the the timeline server is to allow the cluster and its apps 
to store their history information, metrics and so on by complying with the 
defined timeline data model. Later on, users and admins can query this 
information to do the analysis.

2. Application history (or we prefer to call it generic history service) is now 
a built-in service in the timeline server to record the generic history 
information of YARN apps. It was on a separate store (on FS), but after 
YARN-2033, it has been moved to the timeline store too, as a payload. We 
replace the old storage layer, but keep the existing interfaces (web UI, 
services, CLI) not changed to be the analog of what RM provides for running 
apps. We still didn't integrate TimelineClient and AHSClient, the latter of 
which is RPC interface of getting generic history information via RPC 
interface. APPLICATION_HISTORY_ENABLED is the only remaining old config to 
control whether we also want to pull the app info from the generic history 
service inside the timeline server. You may want to take a look at YARN-2033 to 
get more context about the change. Moreover, as a number of limitation of the 
old history store, we're no longer going to support it.

3. The document is definitely staled. I'll file separate document Jira, 
however, it's too late for 2.6. Let's target 2.7 for an up-to-date document 
about timeline service and its built-in generic history service. Does it sound 
good?

> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2854) The document about timeline service and generic service needs to be updated

2014-11-11 Thread Zhijie Shen (JIRA)

Zhijie Shen created YARN-2854:
-

 Summary: The document about timeline service and generic service 
needs to be updated
 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-11-11 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207405#comment-14207405
 ] 

Craig Welch commented on YARN-2637:
---

I think the fix is fairly straightforward - there is an "amResource" property 
on the SchedulerApplicationAttempt / FiCaSchedulerApp, it does not appear to be 
being populated in the CapacityScheduler case (but it should be, and the 
information is available in the submission / from the resource requests of the 
appliction) - populate this value, and then add a Resource property to 
LeafQueue which represents the resources used by active application masters - 
when an application starts, add it's amResource value to the LeafQueue's active 
application master resource value, when an application ends, remove it.  Before 
starting an application compare the sum of the active application masters + the 
new application's resource to the resource represented by the percentage of 
cluster resource allowed to be used by am's in the queue  (this can differ by 
queue...) and if it exceeds the value do not start the application.  The 
existing trickle down logic base on the minimum allocation should be removed, 
there is also logic regarding how many applications can be running based on 
explicit configuration which should be retained.

{code}
if ((queue.activeApplicationMasterResourceTotal + 
readyToStartApplication.applicationMasterResource) <= 
queue.portionOfClusterResourceAllowedForApplicatoinMaster * clusterResource && 
maxAllowedApplications < runningApplications + 1) {
  queue.startTheApp
}
{code}


> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Priority: Critical
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207390#comment-14207390
 ] 

Ravi Prakash commented on YARN-1964:


I'm a +1 on this patch. I'll commit it to trunk and branch-2 soon. Soon as I 
get confirmation from Arun, I'll commit it into branch-2.6 as well.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2853:
--
Attachment: (was: YARN-2853.1.patch)

> Killing app may hang while AM is unregistering
> --
>
> Key: YARN-2853
> URL: https://issues.apache.org/jira/browse/YARN-2853
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2853.1.patch, YARN-2853.1.patch
>
>
> When killing an app, app first moves to KILLING state, If RMAppAttempt 
> receives the attempt_unregister event before attempt_kill event,  it'll 
> ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
> KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2853:
--
Attachment: YARN-2853.1.patch

> Killing app may hang while AM is unregistering
> --
>
> Key: YARN-2853
> URL: https://issues.apache.org/jira/browse/YARN-2853
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2853.1.patch, YARN-2853.1.patch
>
>
> When killing an app, app first moves to KILLING state, If RMAppAttempt 
> receives the attempt_unregister event before attempt_kill event,  it'll 
> ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
> KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2853:
--
Attachment: YARN-2853.1.patch

> Killing app may hang while AM is unregistering
> --
>
> Key: YARN-2853
> URL: https://issues.apache.org/jira/browse/YARN-2853
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2853.1.patch, YARN-2853.1.patch
>
>
> When killing an app, app first moves to KILLING state, If RMAppAttempt 
> receives the attempt_unregister event before attempt_kill event,  it'll 
> ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
> KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2806) log container allocation requests

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207304#comment-14207304
 ] 

Hadoop QA commented on YARN-2806:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680883/YARN-2806.patch
  against trunk revision 163bb55.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5819//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5819//console

This message is automatically generated.

> log container allocation requests
> -
>
> Key: YARN-2806
> URL: https://issues.apache.org/jira/browse/YARN-2806
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Allen Wittenauer
> Attachments: YARN-2806.patch
>
>
> I might have missed it, but I don't see where we log application container 
> requests outside of the DEBUG context.  Without this being logged, we have no 
> idea on a per application the lag an application might be having in the 
> allocation system. 
> We should probably add this as an event to the RM audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207303#comment-14207303
 ] 

Hadoop QA commented on YARN-2853:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680930/YARN-2853.1.patch
  against trunk revision 163bb55.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5820//console

This message is automatically generated.

> Killing app may hang while AM is unregistering
> --
>
> Key: YARN-2853
> URL: https://issues.apache.org/jira/browse/YARN-2853
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2853.1.patch
>
>
> When killing an app, app first moves to KILLING state, If RMAppAttempt 
> receives the attempt_unregister event before attempt_kill event,  it'll 
> ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
> KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-11 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207292#comment-14207292
 ] 

Jian He commented on YARN-2853:
---

Instead, we could get rid of the killing state completely and let app stay at 
the original state and change RMApp to handle attempt_killed state at each 
possible state.  this way, we could avoid race condition like this.  I'll file 
a separate  jira to do this.

> Killing app may hang while AM is unregistering
> --
>
> Key: YARN-2853
> URL: https://issues.apache.org/jira/browse/YARN-2853
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2853.1.patch
>
>
> When killing an app, app first moves to KILLING state, If RMAppAttempt 
> receives the attempt_unregister event before attempt_kill event,  it'll 
> ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
> KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-11 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2853:
--
Attachment: YARN-2853.1.patch

Uploaded a patch to handle the possible attempt_unregistered, attempt_failed, 
attempt_finished state at app_killing state.

> Killing app may hang while AM is unregistering
> --
>
> Key: YARN-2853
> URL: https://issues.apache.org/jira/browse/YARN-2853
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2853.1.patch
>
>
> When killing an app, app first moves to KILLING state, If RMAppAttempt 
> receives the attempt_unregister event before attempt_kill event,  it'll 
> ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
> KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup

2014-11-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207268#comment-14207268
 ] 

Wangda Tan commented on YARN-2729:
--

Hi [~Naganarasimha],
IIRC, the script based patch should be based on YARN-2495, and we should create 
a script-based labels provider extend NodeLabelsProviderService, correct? But I 
haven't seen much relationship between this and YARN-2495 besides configuration 
options.

Please let me know if I understood incorrectly. 

Thanks,
Wangda

> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup
> ---
>
> Key: YARN-2729
> URL: https://issues.apache.org/jira/browse/YARN-2729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, 
> YARN-2729.20141031-1.patch
>
>
> Support script based NodeLabelsProvider Interface in Distributed Node Label 
> Configuration Setup . 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-11 Thread Jian He (JIRA)

Jian He created YARN-2853:
-

 Summary: Killing app may hang while AM is unregistering
 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


When killing an app, app first moves to KILLING state, If RMAppAttempt receives 
the attempt_unregister event before attempt_kill event,  it'll ignore the later 
attempt_kill event.  Hence, RMApp won't be able to move to KILLED state and 
stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207251#comment-14207251
 ] 

Karthik Kambatla commented on YARN-2604:


Looks mostly good. Just want to confirm - when there are no nodes connected to 
the RM, the patch sets the max-allocation to the configured value and not zero. 
I think this is good, otherwise all apps will get rejected immediately after 
the RM (re)starts. Actually, I wonder if we should add a config to specify 
either (a) a particular number of NMs after which this behavior kicks in or (b) 
a minimum/floor value for the configurable maximum (min-max-allocation :P). 
[~jlowe] - do you think such a config would be useful? 

Few comments on the patch itself: 
# We should have tests similar to TestFifoScheduler#testMaximumAllocation for 
Capacity and FairSchedulers.
# Nit: Rename AbstractYarnScheduler#realMaximumAllocation to 
configuredMaximumAllocation? And, in all the schedulers, we should set 
configuredMaximumAllocation first and then set maximumAllocation to that. Also, 
given both these fields are in AbstractYarnScheduler, I wouldn't refer to them 
using {{this.}} in the sub-classes.
# Nit: With locks and unlocks, we follow the following convention in YARN. Mind 
updating accordingly? 
{code}
lock.lock();
try {
// do your thing
} finally { 
  lock.unlock();
}
{code}

> Scheduler should consider max-allocation-* in conjunction with the largest 
> node
> ---
>
> Key: YARN-2604
> URL: https://issues.apache.org/jira/browse/YARN-2604
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch
>
>
> If the scheduler max-allocation-* values are larger than the resources 
> available on the largest node in the cluster, an application requesting 
> resources between the two values will be accepted by the scheduler but the 
> requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2014-11-11 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207194#comment-14207194
 ] 

Carlo Curino commented on YARN-2009:


Sunil,

A thing to keep in mind is that any preemption action you initiate as a 
significant delay, i.e., we will see effects only a while after, likely under 
somewhat changed cluster conditions and app needs, etc..
 
For this reason we decided to maximize the flexibility of the application being 
preempted (allowing for late bind on which containers to yield back), instead 
of constraining the requests with strict locality preferences. 
Intuition being that we have a better chance to be efficient on the preempted 
side than on the preempting side (already running tasks and immediate impact, 
vs hypothesis on task locality for future running containers). 

I don't have any strong evidence to back those intuitions (which are likely to 
hold for some workload but probably not all), but I suggest you to consider 
this concerns, and maybe devise some good experiments to 
test whether the locality-centric preemption gets you the benefit you hope for 
(it is otherwise unnecessary complication, that has hard to understand 
interactions with fairness/priorities etc...).

Similar thoughts apply to node labels, however I believe in this context the 
needs are likely to be more "stable" over time, so maybe preempted in a 
label-aware manner might be good.

my 2 cents,
Carlo

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-11-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207172#comment-14207172
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne]
Thanks for update,

Major comments:
1. Regarding the new requirement you added:
bq. The current patch only allows the disable queue preemption flag to be set 
on leaf queues. However, after discussing his internally, we need to be able to 
have leaf queues inherit this property from their parent.
I think the feature is make sense, but I don't think we can really achieve it 
today. In your example, let's say root has a and b, a has a1/a2 and b has 
b1/b2. It maybe no problem to set a is non-preemptable but a1/a2 are 
preemptable, the existing ideal_capacity calculation algorithm will consider 
this and mark containers will be preempted in a1/a2 as what you expected. 
However, you cannot say, in this case, b cannot preempt resource from a. 
Because,
When a container preempted, the resource is available for everyone to use. Like 
the resource freed after preempt a container from a2 that doesn't mean the 
resource is dedicated to a1 to use only.
So the statements are not always true:
{code}
A should not be preemptable
A1 and A2 should be able to preempt each other
{code}
So my opinion is not do it now since we don't have corresponding logic in CS 
side for this feature.

Minor comments:
1. getUnderservedQueues the name is a little confusing me, it should be 
getMostUnderServedQueues
2. I would suggest to create a method in 
CapacitySchedulerConfiguration.getPreemptionEnabled(queue)
3. In {{getUnderservedQueues}}, you can simple use tqComparator.compare(q0, q1) 
instead of calculate idealPctGuaranteed
4. it's better to rename idealPctGuaranteed -> caculate(or 
get)IdealPctGuaranteed.

For tests, I'll review it after we can get decision about my concern in *major 
comments*.

Wangda

> Disable preemption at Queue level
> -
>
> Key: YARN-2056
> URL: https://issues.apache.org/jira/browse/YARN-2056
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Mayank Bansal
>Assignee: Eric Payne
> Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
> YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
> YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
> YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, 
> YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, 
> YARN-2056.201410232244.txt, YARN-2056.201410311746.txt, 
> YARN-2056.201411041635.txt, YARN-2056.201411072153.txt
>
>
> We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207159#comment-14207159
 ] 

Hadoop QA commented on YARN-1964:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680880/YARN-1964.patch
  against trunk revision 99d9d0c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5817//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5817//console

This message is automatically generated.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-11 Thread Siqi Li (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207140#comment-14207140
 ] 

Siqi Li commented on YARN-2811:
---

[~sandyr] Thank you for pointing out the hierarchical scenario. I have updated 
the patch that deals with that case.

> Fair Scheduler is violating max memory settings in 2.4
> --
>
> Key: YARN-2811
> URL: https://issues.apache.org/jira/browse/YARN-2811
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
> YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch
>
>
> This has been seen on several queues showing the allocated MB going 
> significantly above the max MB and it appears to have started with the 2.4 
> upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2806) log container allocation requests

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207122#comment-14207122
 ] 

Hadoop QA commented on YARN-2806:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680883/YARN-2806.patch
  against trunk revision 456b973.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5818//console

This message is automatically generated.

> log container allocation requests
> -
>
> Key: YARN-2806
> URL: https://issues.apache.org/jira/browse/YARN-2806
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Allen Wittenauer
> Attachments: YARN-2806.patch
>
>
> I might have missed it, but I don't see where we log application container 
> requests outside of the DEBUG context.  Without this being logged, we have no 
> idea on a per application the lag an application might be having in the 
> allocation system. 
> We should probably add this as an event to the RM audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207123#comment-14207123
 ] 

Hadoop QA commented on YARN-2811:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680866/YARN-2811.v5.patch
  against trunk revision 061bc29.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5816//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5816//console

This message is automatically generated.

> Fair Scheduler is violating max memory settings in 2.4
> --
>
> Key: YARN-2811
> URL: https://issues.apache.org/jira/browse/YARN-2811
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
> YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch
>
>
> This has been seen on several queues showing the allocated MB going 
> significantly above the max MB and it appears to have started with the 2.4 
> upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2014-11-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207114#comment-14207114
 ] 

Hudson commented on YARN-570:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #6514 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6514/])
YARN-570. Time strings are formated in different timezone. (Akira Ajisaka and 
Peng Zhang via kasha) (kasha: rev 456b973819904e9647dabad292d2d6205dd84399)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/yarn.dt.plugins.js
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Times.java


> Time strings are formated in different timezone
> ---
>
> Key: YARN-570
> URL: https://issues.apache.org/jira/browse/YARN-570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.2.0
>Reporter: Peng Zhang
>Assignee: Akira AJISAKA
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch, 
> YARN-570.3.patch, YARN-570.4.patch, YARN-570.5.patch
>
>
> Time strings on different page are displayed in different timezone.
> If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
> "Wed, 10 Apr 2013 08:29:56 GMT"
> If it is formatted by format() in yarn.util.Times, it appears as "10-Apr-2013 
> 16:29:56"
> Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-11 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207109#comment-14207109
 ] 

Craig Welch commented on YARN-2848:
---

There are a couple of different ways events at the cluster level (nodelabel 
additions/removals, node additions/removals) could be handled by the 
application to update it's own resource - they could merely be a trigger to 
cause the application to recalculate the value from scratch (just a "last 
event" map/value set in the scheduler/etc (topical, node add/remove + per 
label) (serial/vector clock value/ts, etc)), or they could include sufficient 
information for the application to adjust it's resource without necessarily 
having to look at a global view (per node "labels added", "labels removed", the 
node (incl rack) which was added to or removed from the cluster) (?available 
"for a time" for "for N changes" with fallback to a global calculation - may be 
more complex than is warranted)

> (FICA) Applications should maintain an application specific 'cluster' 
> resource to calculate headroom and userlimit
> --
>
> Key: YARN-2848
> URL: https://issues.apache.org/jira/browse/YARN-2848
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
>
> Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
> with cluster level node additions and removals) will entail managing an 
> application-level "slice" of the cluster resource available to the 
> application for use in accurately calculating the application headroom and 
> user limit.  There is an assumption that events which impact this resource 
> will change less frequently than the need to calculate headroom, userlimit, 
> etc (which is a valid assumption given that occurs per-allocation heartbeat). 
>  Given that, the application should (with assistance from cluster-level 
> code...) detect changes to the composition of the cluster (node addition, 
> removal) and when those have occurred, calculate a application specific 
> cluster resource by comparing cluster nodes to it's own blacklist (both rack 
> and individual node).  I think it makes sense to include nodelabel 
> considerations into this calculation as it will be efficient to do both at 
> the same time and the single resource value reflecting both constraints could 
> then be used for efficient frequent headroom and userlimit calculations while 
> remaining highly accurate.  The application would need to be made aware of 
> nodelabel changes it is interested in (the application or removal of labels 
> of interest to the application to/from nodes).  For this purpose, the 
> application submissions's nodelabel expression would be used to determine the 
> nodelabel impact on the resource used to calculate userlimit and headroom 
> (Cases where application elected to request resources not using the 
> application level label expression are out of scope for this - but for the 
> common usecase of an application which uses a particular expression 
> throughout, userlimit and headroom would be accurate) This could also provide 
> an overall mechanism for handling application-specific resource constraints 
> which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2806) log container allocation requests

2014-11-11 Thread Eric Wohlstadter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207108#comment-14207108
 ] 

Eric Wohlstadter commented on YARN-2806:


Added patch for AppSchedulingInfo.updateResourceRequests

> log container allocation requests
> -
>
> Key: YARN-2806
> URL: https://issues.apache.org/jira/browse/YARN-2806
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Allen Wittenauer
> Attachments: YARN-2806.patch
>
>
> I might have missed it, but I don't see where we log application container 
> requests outside of the DEBUG context.  Without this being logged, we have no 
> idea on a per application the lag an application might be having in the 
> allocation system. 
> We should probably add this as an event to the RM audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2806) log container allocation requests

2014-11-11 Thread Eric Wohlstadter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated YARN-2806:
---
Attachment: YARN-2806.patch

> log container allocation requests
> -
>
> Key: YARN-2806
> URL: https://issues.apache.org/jira/browse/YARN-2806
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Allen Wittenauer
> Attachments: YARN-2806.patch
>
>
> I might have missed it, but I don't see where we log application container 
> requests outside of the DEBUG context.  Without this being logged, we have no 
> idea on a per application the lag an application might be having in the 
> allocation system. 
> We should probably add this as an event to the RM audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2852) WebUI: Add disk I/O resource information to the web ui

2014-11-11 Thread Wei Yan (JIRA)

Wei Yan created YARN-2852:
-

 Summary: WebUI: Add disk I/O resource information to the web ui
 Key: YARN-2852
 URL: https://issues.apache.org/jira/browse/YARN-2852
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2851) YarnClient: Add support for disk I/O resource/request information

2014-11-11 Thread Wei Yan (JIRA)

Wei Yan created YARN-2851:
-

 Summary: YarnClient: Add support for disk I/O resource/request 
information
 Key: YARN-2851
 URL: https://issues.apache.org/jira/browse/YARN-2851
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-11 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207088#comment-14207088
 ] 

Craig Welch commented on YARN-2848:
---

To be clear wrt node labels - this is to enable accurate support from a 
headroom and userlimit perspective for more complex label expressions - at 
present I believe single label expressions in relation to (up to) single label 
nodes can be accurate, this should allow for accuracy with more sophisticated 
scenarios.

> (FICA) Applications should maintain an application specific 'cluster' 
> resource to calculate headroom and userlimit
> --
>
> Key: YARN-2848
> URL: https://issues.apache.org/jira/browse/YARN-2848
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
>
> Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
> with cluster level node additions and removals) will entail managing an 
> application-level "slice" of the cluster resource available to the 
> application for use in accurately calculating the application headroom and 
> user limit.  There is an assumption that events which impact this resource 
> will change less frequently than the need to calculate headroom, userlimit, 
> etc (which is a valid assumption given that occurs per-allocation heartbeat). 
>  Given that, the application should (with assistance from cluster-level 
> code...) detect changes to the composition of the cluster (node addition, 
> removal) and when those have occurred, calculate a application specific 
> cluster resource by comparing cluster nodes to it's own blacklist (both rack 
> and individual node).  I think it makes sense to include nodelabel 
> considerations into this calculation as it will be efficient to do both at 
> the same time and the single resource value reflecting both constraints could 
> then be used for efficient frequent headroom and userlimit calculations while 
> remaining highly accurate.  The application would need to be made aware of 
> nodelabel changes it is interested in (the application or removal of labels 
> of interest to the application to/from nodes).  For this purpose, the 
> application submissions's nodelabel expression would be used to determine the 
> nodelabel impact on the resource used to calculate userlimit and headroom 
> (Cases where application elected to request resources not using the 
> application level label expression are out of scope for this - but for the 
> common usecase of an application which uses a particular expression 
> throughout, userlimit and headroom would be accurate) This could also provide 
> an overall mechanism for handling application-specific resource constraints 
> which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2850) DistributedShell: Add support for disk I/O request

2014-11-11 Thread Wei Yan (JIRA)

Wei Yan created YARN-2850:
-

 Summary: DistributedShell: Add support for disk I/O request
 Key: YARN-2850
 URL: https://issues.apache.org/jira/browse/YARN-2850
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2849) MRAppMaster: Add support for disk I/O request

2014-11-11 Thread Wei Yan (JIRA)

Wei Yan created YARN-2849:
-

 Summary: MRAppMaster: Add support for disk I/O request
 Key: YARN-2849
 URL: https://issues.apache.org/jira/browse/YARN-2849
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207083#comment-14207083
 ] 

Karthik Kambatla commented on YARN-570:
---

The patch looks reasonable. +1, relying on others' testing. Checking this in, 
will add one comment in Times.java in the process. 

> Time strings are formated in different timezone
> ---
>
> Key: YARN-570
> URL: https://issues.apache.org/jira/browse/YARN-570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.2.0
>Reporter: Peng Zhang
>Assignee: Akira AJISAKA
> Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch, 
> YARN-570.3.patch, YARN-570.4.patch, YARN-570.5.patch
>
>
> Time strings on different page are displayed in different timezone.
> If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
> "Wed, 10 Apr 2013 08:29:56 GMT"
> If it is formatted by format() in yarn.util.Times, it appears as "10-Apr-2013 
> 16:29:56"
> Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-11 Thread Swapnil Daingade (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207080#comment-14207080
 ] 

Swapnil Daingade commented on YARN-2791:


Thanks Karthik Kambatla. Sure, lets make this a sub-task of YARN-2139.

> Add Disk as a resource for scheduling
> -
>
> Key: YARN-2791
> URL: https://issues.apache.org/jira/browse/YARN-2791
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Swapnil Daingade
>Assignee: Yuliya Feldman
> Attachments: DiskDriveAsResourceInYARN.pdf
>
>
> Currently, the number of disks present on a node is not considered a factor 
> while scheduling containers on that node. Having large amount of memory on a 
> node can lead to high number of containers being launched on that node, all 
> of which compete for I/O bandwidth. This multiplexing of I/O across 
> containers can lead to slower overall progress and sub-optimal resource 
> utilization as containers starved for I/O bandwidth hold on to other 
> resources like cpu and memory. This problem can be solved by considering disk 
> as a resource and including it in deciding how many containers can be 
> concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Abin Shahab (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-1964:
--
Attachment: YARN-1964.patch

Added docs to the site.xml

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-11 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207060#comment-14207060
 ] 

Craig Welch commented on YARN-2848:
---

There are avenues to enhance this later for multiple nodelabel expressions 
later if so desired, likely the api for headroom, etc would need to be 
broadened to include a label expression and a "set" of application label 
expressions for calculating resources would need to be held.  

This is currently specific to Capacity Scheduler, but might be applicable to 
other schedulers as well.



> (FICA) Applications should maintain an application specific 'cluster' 
> resource to calculate headroom and userlimit
> --
>
> Key: YARN-2848
> URL: https://issues.apache.org/jira/browse/YARN-2848
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
>
> Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
> with cluster level node additions and removals) will entail managing an 
> application-level "slice" of the cluster resource available to the 
> application for use in accurately calculating the application headroom and 
> user limit.  There is an assumption that events which impact this resource 
> will change less frequently than the need to calculate headroom, userlimit, 
> etc (which is a valid assumption given that occurs per-allocation heartbeat). 
>  Given that, the application should (with assistance from cluster-level 
> code...) detect changes to the composition of the cluster (node addition, 
> removal) and when those have occurred, calculate a application specific 
> cluster resource by comparing cluster nodes to it's own blacklist (both rack 
> and individual node).  I think it makes sense to include nodelabel 
> considerations into this calculation as it will be efficient to do both at 
> the same time and the single resource value reflecting both constraints could 
> then be used for efficient frequent headroom and userlimit calculations while 
> remaining highly accurate.  The application would need to be made aware of 
> nodelabel changes it is interested in (the application or removal of labels 
> of interest to the application to/from nodes).  For this purpose, the 
> application submissions's nodelabel expression would be used to determine the 
> nodelabel impact on the resource used to calculate userlimit and headroom 
> (Cases where application elected to request resources not using the 
> application level label expression are out of scope for this - but for the 
> common usecase of an application which uses a particular expression 
> throughout, userlimit and headroom would be accurate) This could also provide 
> an overall mechanism for handling application-specific resource constraints 
> which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-11-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207055#comment-14207055
 ] 

Wangda Tan commented on YARN-2495:
--

bq. ... hence shall i create a new class extending TestPBImplRecords in 
hadoop-yarn-server-common project. ?
This is an issue, I suggest to keep your code as-is, but please add checks in 
your tests for the values you added. And in the future, PB objects in 
h-y-sever-common should have an easier way to do testing as h-y-common.

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml or using script 
> suggested by [~aw])
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels

2014-11-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207057#comment-14207057
 ] 

Wangda Tan commented on YARN-2843:
--

Thanks for [~vinodkv]'s review and commit!

> NodeLabels manager should trim all inputs for hosts and labels
> --
>
> Key: YARN-2843
> URL: https://issues.apache.org/jira/browse/YARN-2843
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sushmitha Sreenivasan
>Assignee: Wangda Tan
> Fix For: 2.7.0
>
> Attachments: YARN-2843-1.patch, YARN-2843-2.patch
>
>
> NodeLabels manager should trim all inputs for hosts and labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2848) (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

2014-11-11 Thread Craig Welch (JIRA)

Craig Welch created YARN-2848:
-

 Summary: (FICA) Applications should maintain an application 
specific 'cluster' resource to calculate headroom and userlimit
 Key: YARN-2848
 URL: https://issues.apache.org/jira/browse/YARN-2848
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch


Likely solutions to [YARN-1680] (properly handling node and rack blacklisting 
with cluster level node additions and removals) will entail managing an 
application-level "slice" of the cluster resource available to the application 
for use in accurately calculating the application headroom and user limit.  
There is an assumption that events which impact this resource will change less 
frequently than the need to calculate headroom, userlimit, etc (which is a 
valid assumption given that occurs per-allocation heartbeat).  Given that, the 
application should (with assistance from cluster-level code...) detect changes 
to the composition of the cluster (node addition, removal) and when those have 
occurred, calculate a application specific cluster resource by comparing 
cluster nodes to it's own blacklist (both rack and individual node).  I think 
it makes sense to include nodelabel considerations into this calculation as it 
will be efficient to do both at the same time and the single resource value 
reflecting both constraints could then be used for efficient frequent headroom 
and userlimit calculations while remaining highly accurate.  The application 
would need to be made aware of nodelabel changes it is interested in (the 
application or removal of labels of interest to the application to/from nodes). 
 For this purpose, the application submissions's nodelabel expression would be 
used to determine the nodelabel impact on the resource used to calculate 
userlimit and headroom (Cases where application elected to request resources 
not using the application level label expression are out of scope for this - 
but for the common usecase of an application which uses a particular expression 
throughout, userlimit and headroom would be accurate) This could also provide 
an overall mechanism for handling application-specific resource constraints 
which might be added in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207046#comment-14207046
 ] 

Ravi Prakash commented on YARN-1964:


Hi Karthik! That's fair. I'll ask Arun if he is willing to re-spin 2.6.0.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels

2014-11-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207017#comment-14207017
 ] 

Hudson commented on YARN-2843:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6511 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6511/])
YARN-2843. Fixed NodeLabelsManager to trim inputs for hosts and labels so as to 
make them work correctly. Contributed by Wangda Tan. (vinodkv: rev 
0fd97f9c1989a793b882e6678285607472a3f75a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/NodeLabelTestBase.java


> NodeLabels manager should trim all inputs for hosts and labels
> --
>
> Key: YARN-2843
> URL: https://issues.apache.org/jira/browse/YARN-2843
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sushmitha Sreenivasan
>Assignee: Wangda Tan
> Attachments: YARN-2843-1.patch, YARN-2843-2.patch
>
>
> NodeLabels manager should trim all inputs for hosts and labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2843) NodeLabels manager should trim all inputs for hosts and labels

2014-11-11 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206999#comment-14206999
 ] 

Vinod Kumar Vavilapalli commented on YARN-2843:
---

+1, looks good. Checking this in.

> NodeLabels manager should trim all inputs for hosts and labels
> --
>
> Key: YARN-2843
> URL: https://issues.apache.org/jira/browse/YARN-2843
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sushmitha Sreenivasan
>Assignee: Wangda Tan
> Attachments: YARN-2843-1.patch, YARN-2843-2.patch
>
>
> NodeLabels manager should trim all inputs for hosts and labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206994#comment-14206994
 ] 

Karthik Kambatla commented on YARN-2139:


Thanks for the prototype, Wei. In light of the updates on YARN-2791 and 
YARN-2817, I propose we incorporate suggestions from [~sdaingade] and 
[~acmurthy] before posting patches for sub-tasks. 

Updated JIRA title, description, and marked it unassigned as this is an 
umbrella JIRA. 


> [Umbrella] Support for Disk as a Resource in YARN 
> --
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Scheduling_Design_1.pdf, 
> Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
> YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2791) Add Disk as a resource for scheduling

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206993#comment-14206993
 ] 

Karthik Kambatla commented on YARN-2791:


Thanks [~sdaingade] for sharing the design doc. Well articulated. 

The designs on YARN-2139 and YARN-2791 are very similar, except for the disk 
resources are called vdisks in YARN-2139 and spindles in YARN-2791. In addition 
to the items specified here, YARN-2139 talks about isolation as well. Other 
than that, do you see any major items YARN-2791 covers that YARN-2139? The 
WebUI is good and very desirable, we should definitely include it. Also, 

I suggest we make this (as is - or split into multiple JIRAs) a sub-task of 
YARN-2139. Discussing the high-level details on one JIRA helps with aligning on 
one final design doc based on everyone's suggestions. 



> Add Disk as a resource for scheduling
> -
>
> Key: YARN-2791
> URL: https://issues.apache.org/jira/browse/YARN-2791
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Swapnil Daingade
>Assignee: Yuliya Feldman
> Attachments: DiskDriveAsResourceInYARN.pdf
>
>
> Currently, the number of disks present on a node is not considered a factor 
> while scheduling containers on that node. Having large amount of memory on a 
> node can lead to high number of containers being launched on that node, all 
> of which compete for I/O bandwidth. This multiplexing of I/O across 
> containers can lead to slower overall progress and sub-optimal resource 
> utilization as containers starved for I/O bandwidth hold on to other 
> resources like cpu and memory. This problem can be solved by considering disk 
> as a resource and including it in deciding how many containers can be 
> concurrently run on a node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-11-11 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2139:
---
Description: YARN should consider disk as another resource for (1) 
scheduling tasks on nodes, (2) isolation at runtime, (3) spindle locality.   
(was: YARN should support considering disk for scheduling tasks on nodes, and 
provide isolation for these allocations at runtime.)

> [Umbrella] Support for Disk as a Resource in YARN 
> --
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Scheduling_Design_1.pdf, 
> Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
> YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-11-11 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2139:
---
Summary: [Umbrella] Support for Disk as a Resource in YARN   (was: Add 
support for disk IO isolation/scheduling for containers)

> [Umbrella] Support for Disk as a Resource in YARN 
> --
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Scheduling_Design_1.pdf, 
> Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
> YARN-2139-prototype.patch
>
>
> YARN should support considering disk for scheduling tasks on nodes, and 
> provide isolation for these allocations at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2139) Add support for disk IO isolation/scheduling for containers

2014-11-11 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2139:
---
Assignee: (was: Wei Yan)

> Add support for disk IO isolation/scheduling for containers
> ---
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Scheduling_Design_1.pdf, 
> Disk_IO_Scheduling_Design_2.pdf, YARN-2139-prototype-2.patch, 
> YARN-2139-prototype.patch
>
>
> YARN should support considering disk for scheduling tasks on nodes, and 
> provide isolation for these allocations at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-11 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2811:
--
Attachment: YARN-2811.v5.patch

> Fair Scheduler is violating max memory settings in 2.4
> --
>
> Key: YARN-2811
> URL: https://issues.apache.org/jira/browse/YARN-2811
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
> YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch
>
>
> This has been seen on several queues showing the allocated MB going 
> significantly above the max MB and it appears to have started with the 2.4 
> upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2817) Disk drive as a resource in YARN

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206981#comment-14206981
 ] 

Karthik Kambatla commented on YARN-2817:


Resolving this as a duplicate of YARN-2791 since that JIRA proposes the exact 
same thing. In any case, I think these should be subtasks for YARN-2139, 
looking at the prototype code posted there. 

> Disk drive as a resource in YARN
> 
>
> Key: YARN-2817
> URL: https://issues.apache.org/jira/browse/YARN-2817
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> As YARN continues to cover new ground in terms of new workloads, disk is 
> becoming a very important resource to govern.
> It might be prudent to start with something very simple - allow applications 
> to request entire drives (e.g. 2 drives out of the 12 available on a node), 
> we can then also add support for specific iops, bandwidth etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2817) Disk drive as a resource in YARN

2014-11-11 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2817.

Resolution: Duplicate

> Disk drive as a resource in YARN
> 
>
> Key: YARN-2817
> URL: https://issues.apache.org/jira/browse/YARN-2817
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> As YARN continues to cover new ground in terms of new workloads, disk is 
> becoming a very important resource to govern.
> It might be prudent to start with something very simple - allow applications 
> to request entire drives (e.g. 2 drives out of the 12 available on a node), 
> we can then also add support for specific iops, bandwidth etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2014-11-11 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2423:

Target Version/s: 2.7.0

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.patch, YARN-2423.patch, YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206939#comment-14206939
 ] 

Karthik Kambatla commented on YARN-1964:


[~raviprak] - I would prefer backporting it to branch-2.6 only if it goes into 
2.6.0 release, so we can avoid including features in point releases. In any 
case, the plan is to release 2.7.0 soon after 2.6.0. 

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206855#comment-14206855
 ] 

Ravi Prakash commented on YARN-1964:


Thanks Abin! The patch is looking really good now. However the documentation 
doesn't seem to be compiling for me. Once that is figured out, I'm a +1. I am 
looking to commit it EOD today to trunk, branch-2, branch-2.6. I'd like to 
commit it to 2.6 also and request a respin of the RC.


> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2847) Linux native container executor segfaults if default banned user detected

2014-11-11 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206801#comment-14206801
 ] 

Jason Lowe commented on YARN-2847:
--

The problem is in this code:

{code}
  char **banned_users = get_values(BANNED_USERS_KEY);
  char **banned_user = (banned_users == NULL) ? 
(char**) DEFAULT_BANNED_USERS : banned_users;
  for(; *banned_user; ++banned_user) {
if (strcmp(*banned_user, user) == 0) {
  free(user_info);
  if (banned_users != (char**)DEFAULT_BANNED_USERS) {
free_values(banned_users);
  }
  fprintf(LOGFILE, "Requested user %s is banned\n", user);
  return NULL;
}
  }
  if (banned_users != NULL && banned_users != (char**)DEFAULT_BANNED_USERS) {
free_values(banned_users);
  }
{code}

Note that in one case we check for banned_users != NULL and != 
DEFAULT_BANNED_USERS but in another case we're missing the NULL check.

Lots of ways to fix it:

- free_values could check for NULL
- banned_users could always be non-NULL (i.e.: set it to DEFAULT_BANNED_USERS 
if get_values returns NULL)
- add check for != NULL before calling free_values

> Linux native container executor segfaults if default banned user detected
> -
>
> Key: YARN-2847
> URL: https://issues.apache.org/jira/browse/YARN-2847
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>
> The check_user function in container-executor.c can cause a segmentation 
> fault if banned.users is not provided but the user is detected as one of the 
> default users.  In that scenario it will call free_values on a NULL pointer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2847) Linux native container executor segfaults if default banned user detected

2014-11-11 Thread Jason Lowe (JIRA)

Jason Lowe created YARN-2847:


 Summary: Linux native container executor segfaults if default 
banned user detected
 Key: YARN-2847
 URL: https://issues.apache.org/jira/browse/YARN-2847
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe


The check_user function in container-executor.c can cause a segmentation fault 
if banned.users is not provided but the user is detected as one of the default 
users.  In that scenario it will call free_values on a NULL pointer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection

2014-11-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206788#comment-14206788
 ] 

Hudson commented on YARN-2735:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6510 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6510/])
YARN-2735. diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
initialized twice in DirectoryCollection. (Zhihai Xu via kasha) (kasha: rev 
061bc293c8dd3e2605cf150568988bde18407af6)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java


> diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
> initialized twice in DirectoryCollection
> ---
>
> Key: YARN-2735
> URL: https://issues.apache.org/jira/browse/YARN-2735
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: YARN-2735.000.patch
>
>
> diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
> initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206768#comment-14206768
 ] 

Hadoop QA commented on YARN-1964:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680823/YARN-1964.patch
  against trunk revision 58e9bf4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5815//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5815//console

This message is automatically generated.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection

2014-11-11 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206765#comment-14206765
 ] 

Karthik Kambatla commented on YARN-2735:


Trivial patch. +1. Checking this in.

> diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
> initialized twice in DirectoryCollection
> ---
>
> Key: YARN-2735
> URL: https://issues.apache.org/jira/browse/YARN-2735
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-2735.000.patch
>
>
> diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
> initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection

2014-11-11 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2735:
---
Priority: Trivial  (was: Minor)
  Labels: newbie  (was: )

> diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
> initialized twice in DirectoryCollection
> ---
>
> Key: YARN-2735
> URL: https://issues.apache.org/jira/browse/YARN-2735
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-2735.000.patch
>
>
> diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
> initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-11-11 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1680:
--
Target Version/s: 2.7.0  (was: 2.6.0)

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Craig Welch
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-11-11 Thread Abin Shahab (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-1964:
--
Attachment: YARN-1964.patch

fixed imports.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2009) Priority support for preemption in ProportionalCapacityPreemptionPolicy

2014-11-11 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206675#comment-14206675
 ] 

Sunil G commented on YARN-2009:
---

Yes. Idea is more or less the same. 

We had a prototype done on this and along with ApplicationPrioirty, this can be 
brought as separate policy.
However points to discuss are
* Container has to be selected from lower priority applications based on node 
locality constraint from higher priority application
* Co-existing this logic with node-labels
* Am container has to be spared etc.

Pls share your thoughts.

> Priority support for preemption in ProportionalCapacityPreemptionPolicy
> ---
>
> Key: YARN-2009
> URL: https://issues.apache.org/jira/browse/YARN-2009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Devaraj K
>Assignee: Sunil G
>
> While preempting containers based on the queue ideal assignment, we may need 
> to consider preempting the low priority application containers first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206671#comment-14206671
 ] 

Zhijie Shen commented on YARN-2838:
---

[~Naganarasimha], sorry for not responding you immediately as being busy on 
finalizing 2.6. A quick scan through your issue document. Here's my 
clarification:

1. While the entry point of the this sub-module is still called 
ApplicationHistoryServer, it is actually generalized to be TimelineServer right 
now (definitely we need to refactor the code at some point). The baseline 
service provided the the timeline server is to allow the cluster and its apps 
to store their history information, metrics and so on by complying with the 
defined timeline data model. Later on, users and admins can query this 
information to do the analysis.

2. Application history (or we prefer to call it generic history service) is now 
a built-in service in the timeline server to record the generic history 
information of YARN apps. It was on a separate store (on FS), but after 
YARN-2033, it has been moved to the timeline store too, as a payload. We 
replace the old storage layer, but keep the existing interfaces (web UI, 
services, CLI) not changed to be the analog of what RM provides for running 
apps. We still didn't integrate TimelineClient and AHSClient, the latter of 
which is RPC interface of getting generic history information via RPC 
interface. APPLICATION_HISTORY_ENABLED is the only remaining old config to 
control whether we also want to pull the app info from the generic history 
service inside the timeline server. You may want to take a look at YARN-2033 to 
get more context about the change. Moreover, as a number of limitation of the 
old history store, we're no longer going to support it.

3. The document is definitely staled. I'll file separate document Jira, 
however, it's too late for 2.6. Let's target 2.7 for an up-to-date document 
about timeline service and its built-in generic history service. Does it sound 
good?

> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage priority labels

2014-11-11 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2693:
--
Attachment: 0001-YARN-2693.patch

Uploading a work in progress patch for priority label manager.

* Supporting file system and memory store
* Handling 4 events to store such as add_labels_to_queue, 
remove_labels_from_queue, store_cluster_labels, remove_cluster_labels
* Using specific pb impls to store the labels details to file
* Design similar to node label manager, however changes in event specific 
handling is done
* RMPriorityLabelManager class has to be on top of this as a wrapper, which we 
can bring up as RM core changes

Kindly review and provide major comments. I will keep updating this with tests 
also.

> Priority Label Manager in RM to manage priority labels
> --
>
> Key: YARN-2693
> URL: https://issues.apache.org/jira/browse/YARN-2693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2693.patch
>
>
> Focus of this JIRA is to have a centralized service to handle priority labels.
> Support operations such as
> * Add/Delete priority label to a specified queue
> * Manage integer mapping associated with each priority label
> * Support managing default priority label of a given queue
> * ACL support in queue level for priority label
> * Expose interface to RM to validate priority label
> Storage for this labels will be done in FileSystem and in Memory similar to 
> NodeLabel
> * FileSystem Based : persistent across RM restart
> * Memory Based: non-persistent across RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-11 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206634#comment-14206634
 ] 

Naganarasimha G R commented on YARN-2838:
-

Hi [~zjshen],
  Can you please feedback on these issues ? As some issues requires 
discussion before rectifiction...


> Issues with TimeLineServer (Application History)
> 
>
> Key: YARN-2838
> URL: https://issues.apache.org/jira/browse/YARN-2838
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: IssuesInTimelineServer.pdf
>
>
> Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-11-11 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206626#comment-14206626
 ] 

Naganarasimha G R commented on YARN-2495:
-

{quote}
The benefit are
1) You don't have to update test cases for that
2) The semanic are clear, create a register request with label or not.
{quote}
True, and will be able to revert some unwanted testcase modification. have 
corrected it.
bq. I suggest to have different option for script-based/config-based, even if 
we can combine them together.
Ok, will have different config param for script and config based

bq. IIUC, NM_NODE_LABELS_FROM_CONFIG is a list of labels, even if we want to 
separate the two properties, we cannot remove NM_NODE_LABELS_FROM_CONFIG, 
correct? 
Had searched it wrongly and as you mentioned the name of was not good enough 
for me to recollect back too. corrected it 

bq. I think it's better to leverage existing utility class instead of implement 
your own. For example, you have set values but not check them, which is 
incorrect, but using utility class can avoid such problem. Even if you added 
new fields, tests will cover them without any changes:
Problem is ??TestPBImplRecords?? is in ??hadoop-yarn-common?? project and 
??NodeHeartbeatRequestPBImpl?? and others are in  ??hadoop-yarn-server-common?? 
project. So as we cant add dependency on ??hadoop-yarn-server-common?? in 
??hadoop-yarn-common??, hence shall i create a new class extending 
TestPBImplRecords in  ??hadoop-yarn-server-common?? project. ? 

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml or using script 
> suggested by [~aw])
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206616#comment-14206616
 ] 

Jason Lowe commented on YARN-2846:
--

Thanks for the report and patch, Junping!

Nit: If reacquireContainer is going to allow InterruptedException to be thrown 
then I'd rather remove the try/catch around the Thread.sleep call and just let 
the exception be thrown directly from there. We can let the code catching the 
exception deal with any logging/etc as appropriate for that caller.  In this 
case we can move the log message to RecoveredContainerLaunch when it fields the 
InterruptedException and chooses not to propagate it upwards.

I'm curious why we're not seeing a similar issue with regular ContainerLaunch 
threads, as they should be interrupted as well.  Are those threads silently 
swallowing the interrupt?  Because otherwise I would expect us to log a 
container completion just like we were doing with a recovered container.

> Incorrect persist exit code for running containers in reacquireContainer() 
> that interrupted by NodeManager restart.
> ---
>
> Key: YARN-2846
> URL: https://issues.apache.org/jira/browse/YARN-2846
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2846-demo.patch
>
>
> The NM restart work preserving feature could make running AM container get 
> LOST and killed during stop NM daemon. The exception is like below:
> {code}
> 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
> container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
> physical memory used; 931.3 MB of 1.0 GB virtual memory used
> 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
> 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
> Applications still running : [application_1415666714233_0001]
> 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 45454
> 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
> IPC Server listener on 45454
> 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(141)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
> IPC Server Responder
> 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
> aggregation for application_1415666714233_0001
> 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
> application application_1415666714233_0001
> 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(476)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
> (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
> container_1415666714233_0001_01_01
> java.io.IOException: Interrupted while waiting for process 20001 to exit
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
> ... 6 more
> {code}
> In reacquireCo

[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206601#comment-14206601
 ] 

Hadoop QA commented on YARN-2846:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680801/YARN-2846-demo.patch
  against trunk revision 58e9bf4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5814//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5814//console

This message is automatically generated.

> Incorrect persist exit code for running containers in reacquireContainer() 
> that interrupted by NodeManager restart.
> ---
>
> Key: YARN-2846
> URL: https://issues.apache.org/jira/browse/YARN-2846
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2846-demo.patch
>
>
> The NM restart work preserving feature could make running AM container get 
> LOST and killed during stop NM daemon. The exception is like below:
> {code}
> 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
> container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
> physical memory used; 931.3 MB of 1.0 GB virtual memory used
> 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
> 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
> Applications still running : [application_1415666714233_0001]
> 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 45454
> 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
> IPC Server listener on 45454
> 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(141)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
> IPC Server Responder
> 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
> aggregation for application_1415666714233_0001
> 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
> application application_1415666714233_0001
> 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(476)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
> (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
> container_1415666714233_0001_01_01
> java.io.IOException: Interrupted while waiting for process 20001 to exit
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
> at

[jira] [Updated] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2846:
-
Attachment: YARN-2846-demo.patch

Upload the first demo patch to fix the problem.

> Incorrect persist exit code for running containers in reacquireContainer() 
> that interrupted by NodeManager restart.
> ---
>
> Key: YARN-2846
> URL: https://issues.apache.org/jira/browse/YARN-2846
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2846-demo.patch
>
>
> The NM restart work preserving feature could make running AM container get 
> LOST and killed during stop NM daemon. The exception is like below:
> {code}
> 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
> container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
> physical memory used; 931.3 MB of 1.0 GB virtual memory used
> 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
> 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
> Applications still running : [application_1415666714233_0001]
> 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 45454
> 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
> IPC Server listener on 45454
> 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(141)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
> IPC Server Responder
> 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
> aggregation for application_1415666714233_0001
> 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
> application application_1415666714233_0001
> 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(476)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
> (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
> container_1415666714233_0001_01_01
> java.io.IOException: Interrupted while waiting for process 20001 to exit
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
> ... 6 more
> {code}
> In reacquireContainer() of ContainerExecutor.java, the while loop of checking 
> container process (AM container) will be interrupted by NM stop. The 
> IOException get thrown and failed to generate an ExitCodeFile for the running 
> container. Later, the IOException will be caught in upper call 
> (RecoveredContainerLaunch.call()) and the ExitCode (by default to be LOST 
> without any setting) get persistent in NMStateStore. 
> After NM restart again, this container is recovered as COMPLETE state but 
> exit code is LOST (154) - cause this (AM) container get killed later.
> We should get rid of recording the exit code of running containers if 
> detecting process is interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-2846:


Assignee: Junping Du

> Incorrect persist exit code for running containers in reacquireContainer() 
> that interrupted by NodeManager restart.
> ---
>
> Key: YARN-2846
> URL: https://issues.apache.org/jira/browse/YARN-2846
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
>
> The NM restart work preserving feature could make running AM container get 
> LOST and killed during stop NM daemon. The exception is like below:
> {code}
> 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
> container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
> physical memory used; 931.3 MB of 1.0 GB virtual memory used
> 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
> (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
> 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
> Applications still running : [application_1415666714233_0001]
> 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
> server on 45454
> 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
> IPC Server listener on 45454
> 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:serviceStop(141)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
>  waiting for pending aggregation during exit
> 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
> IPC Server Responder
> 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
> aggregation for application_1415666714233_0001
> 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
> application application_1415666714233_0001
> 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(476)) - 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is interrupted. Exiting.
> 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
> (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
> container_1415666714233_0001_01_01
> java.io.IOException: Interrupted while waiting for process 20001 to exit
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
> ... 6 more
> {code}
> In reacquireContainer() of ContainerExecutor.java, the while loop of checking 
> container process (AM container) will be interrupted by NM stop. The 
> IOException get thrown and failed to generate an ExitCodeFile for the running 
> container. Later, the IOException will be caught in upper call 
> (RecoveredContainerLaunch.call()) and the ExitCode (by default to be LOST 
> without any setting) get persistent in NMStateStore. 
> After NM restart again, this container is recovered as COMPLETE state but 
> exit code is LOST (154) - cause this (AM) container get killed later.
> We should get rid of recording the exit code of running containers if 
> detecting process is interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-11 Thread Junping Du (JIRA)

Junping Du created YARN-2846:


 Summary: Incorrect persist exit code for running containers in 
reacquireContainer() that interrupted by NodeManager restart.
 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Priority: Blocker


The NM restart work preserving feature could make running AM container get LOST 
and killed during stop NM daemon. The exception is like below:
{code}
2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB physical 
memory used; 931.3 MB of 1.0 GB virtual memory used
2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
(SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
(ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - Applications 
still running : [application_1415666714233_0001]
2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
server on 45454
2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping IPC 
Server listener on 45454
2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
(LogAggregationService.java:serviceStop(141)) - 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
 waiting for pending aggregation during exit
2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping IPC 
Server Responder
2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log aggregation 
for application_1415666714233_0001
2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
application application_1415666714233_0001
2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(476)) - 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 is interrupted. Exiting.
2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
(RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
container_1415666714233_0001_01_01
java.io.IOException: Interrupted while waiting for process 20001 to exit
at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
... 6 more
{code}
In reacquireContainer() of ContainerExecutor.java, the while loop of checking 
container process (AM container) will be interrupted by NM stop. The 
IOException get thrown and failed to generate an ExitCodeFile for the running 
container. Later, the IOException will be caught in upper call 
(RecoveredContainerLaunch.call()) and the ExitCode (by default to be LOST 
without any setting) get persistent in NMStateStore. 
After NM restart again, this container is recovered as COMPLETE state but exit 
code is LOST (154) - cause this (AM) container get killed later.
We should get rid of recording the exit code of running containers if detecting 
process is interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2841) RMProxy should retry EOFException

2014-11-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206536#comment-14206536
 ] 

Hudson commented on YARN-2841:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1954 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1954/])
YARN-2841. RMProxy should retry EOFException.  Contributed by Jian He (xgong: 
rev 5c9a51f140ba76ddb25580aeb288db25e3f9653f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
YARN-2841: Correct fix version from branch-2.6 to branch-2.7 in the (xgong: rev 
58e9bf4b908e0b21309006eba49899b092f38071)
* hadoop-yarn-project/CHANGES.txt


> RMProxy should retry EOFException 
> --
>
> Key: YARN-2841
> URL: https://issues.apache.org/jira/browse/YARN-2841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jian He
>Assignee: Jian He
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-2841.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2780) Log aggregated resource allocation in rm-appsummary.log

2014-11-11 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206532#comment-14206532
 ] 

Jason Lowe commented on YARN-2780:
--

+1 lgtm.  Will commit this later today if there are no objections.

> Log aggregated resource allocation in rm-appsummary.log
> ---
>
> Key: YARN-2780
> URL: https://issues.apache.org/jira/browse/YARN-2780
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Koji Noguchi
>Assignee: Eric Payne
>Priority: Minor
> Attachments: YARN-2780.v1.201411031728.txt, 
> YARN-2780.v2.201411061601.txt
>
>
> YARN-415 added useful information about resource usage by applications.  
> Asking to log that info inside rm-appsummary.log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2841) RMProxy should retry EOFException

2014-11-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206459#comment-14206459
 ] 

Hudson commented on YARN-2841:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1930 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1930/])
YARN-2841. RMProxy should retry EOFException.  Contributed by Jian He (xgong: 
rev 5c9a51f140ba76ddb25580aeb288db25e3f9653f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
YARN-2841: Correct fix version from branch-2.6 to branch-2.7 in the (xgong: rev 
58e9bf4b908e0b21309006eba49899b092f38071)
* hadoop-yarn-project/CHANGES.txt


> RMProxy should retry EOFException 
> --
>
> Key: YARN-2841
> URL: https://issues.apache.org/jira/browse/YARN-2841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jian He
>Assignee: Jian He
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-2841.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2841) RMProxy should retry EOFException

2014-11-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206347#comment-14206347
 ] 

Hudson commented on YARN-2841:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #740 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/740/])
YARN-2841. RMProxy should retry EOFException.  Contributed by Jian He (xgong: 
rev 5c9a51f140ba76ddb25580aeb288db25e3f9653f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
YARN-2841: Correct fix version from branch-2.6 to branch-2.7 in the (xgong: rev 
58e9bf4b908e0b21309006eba49899b092f38071)
* hadoop-yarn-project/CHANGES.txt


> RMProxy should retry EOFException 
> --
>
> Key: YARN-2841
> URL: https://issues.apache.org/jira/browse/YARN-2841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jian He
>Assignee: Jian He
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-2841.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2841) RMProxy should retry EOFException

2014-11-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206336#comment-14206336
 ] 

Hudson commented on YARN-2841:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #2 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/2/])
YARN-2841. RMProxy should retry EOFException.  Contributed by Jian He (xgong: 
rev 5c9a51f140ba76ddb25580aeb288db25e3f9653f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
YARN-2841: Correct fix version from branch-2.6 to branch-2.7 in the (xgong: rev 
58e9bf4b908e0b21309006eba49899b092f38071)
* hadoop-yarn-project/CHANGES.txt


> RMProxy should retry EOFException 
> --
>
> Key: YARN-2841
> URL: https://issues.apache.org/jira/browse/YARN-2841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jian He
>Assignee: Jian He
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: YARN-2841.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2845) MicroZookeeperService used in Yarn Registry tests doesn't shut down cleanly on windows

2014-11-11 Thread Steve Loughran (JIRA)

Steve Loughran created YARN-2845:


 Summary: MicroZookeeperService used in Yarn Registry tests doesn't 
shut down cleanly on windows
 Key: YARN-2845
 URL: https://issues.apache.org/jira/browse/YARN-2845
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 2.7.0


It's not surfacing in YARN's own tests, but we are seeing this in slider's 
windows testing ... two test methods, each setting up their own ZK micro 
cluster, seeing the previous test's data. The class needs the same cleanup 
logic as HBASE-6820 —as perhaps its origin, Twill's mini ZK cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2844) WebAppProxyServlet cannot handle urls which contain encoded characters

2014-11-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206157#comment-14206157
 ] 

Hadoop QA commented on YARN-2844:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680759/YARN-2844.patch
  against trunk revision 58e9bf4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5813//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5813//console

This message is automatically generated.

> WebAppProxyServlet cannot handle urls which contain encoded characters
> --
>
> Key: YARN-2844
> URL: https://issues.apache.org/jira/browse/YARN-2844
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Shixiong Zhu
>Priority: Minor
> Attachments: YARN-2844.patch
>
>
> WebAppProxyServlet has a bug about the URL encode/decode. This was found when 
> running Spark on Yarn.
> When a user accesses 
> "http://example.com:8088/proxy/application_1415344371838_0006/executors/threadDump/?executorId=%3Cdriver%3E";,
>  WebAppProxyServlet will require 
> "http://example.com:36429/executors/threadDump/?executorId=%25253Cdriver%25253E";.
>  But Spark Web Server expects 
> "http://example.com:36429/executors/threadDump/?executorId=%3Cdriver%3E";.
> Here are problems I found in WebAppProxyServlet.
> 1. java.net.URI.toString returns an encoded url string. So the following code 
> in WebAppProxyServlet should use `true` instead of `false`.
> {code:java}
> org.apache.commons.httpclient.URI uri = 
>   new org.apache.commons.httpclient.URI(link.toString(), false);
> {code}
> 2. 
> [HttpServletRequest.getPathInfo()|https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html#getPathInfo()]
>  will returns a decoded string. Therefore, if the link is 
> http://example.com:8088/proxy/application_1415344371838_0006/John%2FHunter, 
> pathInfo will be "/application_1415344371838_0006/John/Hunter". Then the URI 
> created in WebAppProxyServlet will be something like ".../John/Hunter", but 
> the correct link should be ".../John%2FHunber". We can use 
> [HttpServletRequest.getRequestURI()|https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html#getRequestURI()]
>  to get the raw path.
> {code:java}
> final String pathInfo = req.getPathInfo();
> {code}
> 3. Use  wrong URI constructor. [URI(String scheme, String authority, String 
> path, String query, String 
> fragment)|https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)]
>  will encode the path and query which have already been encoded. Should use 
> [URI(String 
> str)|https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String)]
>  directly since the url has already been encoded.
> {code:java}
>   URI toFetch = new URI(trackingUri.getScheme(), 
>   trackingUri.getAuthority(),
>   StringHelper.ujoin(trackingUri.getPath(), rest), 
> req.getQueryString(),
>   null);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

98 matches

Mail list logo